Educating moral sensitivity in business: An experimental study to evaluate the effectiveness of a serious moral game

: Serious games have emerged as a promising new form of education and training. Even though the benefits of serious games for education are undisputed, there is still a further need for research on the eﬀicacy of such games. The main goal of our research is to examine the effectiveness of a serious moral game—uFin: The Challenge—that was designed to promote moral sensitivity in business, a precondition of ethical decision-making and behavior and a core moral competency of moral intelligence. A second goal is to examine the role of metacognitive prompting and prosocial nudging in influencing learning effectiveness. Participants (N = 345) took part in an experimental game-based intervention study and completed a pre- and post-test questionnaire assessing moral sensitivity. The analyses of both questionnaire and game data suggest that merely playing this game is effective in promoting moral sensitivity. Neither self-reflection nor exposure to prosocial nudges, however, were determined to be factors that improve learning effectiveness. In contrast, those interventions even decreased the learning outcome in some cases. Overall, findings demonstrate the potential for game-based learning in the moral domain. An important avenue for future research is to examine others ways of increasing the effectiveness of the game. Serious games have emerged as a promising new form of education and training. Even though the benefits of serious games for education are undisputed, there is still a further need for research on the efficacy of such games. The main goal of our research is to examine the effectiveness of a serious moral game — uFin: The Challenge — that was designed to promote moral sensitivity in business, a precondition of ethical decision-making and behavior and a core moral competency of moral intelligence. A second goal is to examine the role of metacognitive prompting and prosocial nudging in influencing learning effectiveness. Participants ( N = 345) took part in an experimental game-based intervention study and completed a pre- and post-test questionnaire assessing moral sensitivity. The analyses of both questionnaire and game data suggest that merely playing this game is effective in promoting moral sensitivity. Neither self-reflection nor exposure to prosocial nudges, however, were determined to be factors that improve learning effectiveness. In contrast, those interventions even decreased the learning outcome in some cases. Overall, findings demonstrate the potential for game-based learning in the moral domain. An important avenue for future research is to examine others ways of increasing the effectiveness of the game.


Introduction
Given the pervasive trend toward gameplay and the challenge of creating more engaging educational practices, serious games (often referred to as educational games) have become increasingly popular among professional trainers and educators. These games are used for different purposes (e.g., training professional skills, languages, and intercultural skills) and in different contexts (e.g., history, management, mathematics, military, physics) (De Freitas & Liarokapis, 2011;Guillén-Nieto & Aleson-Carbonell, 2012;Kwon & Lee, 2016;Ritterfeld, Cody, & Vorderer, 2009;Vlachopoulos & Makri, 2017). To promote more satisfactory ways of teaching business ethics and internalizing ethical values in business (Sholihin, Sari, Yunarity, & Ilyana, 2020), there is also a growing interest in the development of serious moral games (SMGs)-computer or video games designed for the purpose of developing moral competencies (Christen, Faller, Götz, & Müller, 2013;Flanagan & Nissenbaum, 2014;Schrier, 2015;Schrier & Gibson, 2010). This interest is embedded in a broader discussion on ethics and computer games, the role of ethical values in gameplay, and the implications of this & Lapsley, 2005;Pedersen, 2009;Schmocker, Tanner, Katsarov, & Christen, 2019;Sparks & Hunt, 1998). Although promoting the awareness of moral issues has emerged as a topic to be integrated in business-ethics training (e.g., Gautschi & Jones, 1998;Murphy & Boatright, 1994;Ritter, 2006), to our knowledge, there are very few SMGs, and even fewer that have centered on the development of MS. Our second goal is to investigate the role of two particular interventions in improving learning: induction of self-reflection and exposure to prosocial nudges. For this purpose, we manipulate in the experiment whether people are prompted to reflect or not on their actions, and whether they were faced with a prosocial or standard version of the game. Comparing these two strategies seemed particularly interesting to us, since they are based on rather different mechanisms. The first strategy is based on the idea to support learning by prompting people to engage in deliberate thinking about their actions, whereas the second strategy is based on the idea to support learning by subtly steering people's decision-making through mere exposure to particular situational stimuli.

Moral sensitivity in business
In the ethical decision-making, business and organizational literature, MS (also referred to as ethical sensitivity or moral awareness) has been recognized as the critical first step of ethical decision making and behavior in organizations (Jordan, 2007;Miller, Rodgers, & Bingham, 2014;Rest, 1986;Weaver, 2007) as well as a vital competency of moral intelligence (Narvaez & Lapsley, 2005;Tanner & Christen, 2014). To quote Butterfield and colleagues, MS includes realizing whether a given set of actions "could affect the interests, welfare, or expectations of the self or others in a fashion that may conflict with one or more ethical standards" (Butterfield, Trevin õ, & Weaver, 2000, p. 982). Examples that qualify as moral issues in the business domain are deception, misreporting, fraud, bribery, preferential treatment, conflicts of interest, violating privacy, employee's health and safety, to name just a few. Most companies nowadays adopt some set of moral and legal standards (e.g., honesty, fairness, responsibility, anti-corruption, due diligence) in their codes of ethics/conduct to provide a framework for preventing misconduct (see e.g., KPMG, 2015). Not only for the business domain, however, it is important to acknowledge, that simply communicating the core values of the organization is not sufficient to have them put into practice.
To highlight the importance of noticing the moral component in a situation, i.e., to be moral sensitive, consider following example. 1 Imagine a bank that is managing the portfolios of high net worth individuals. Because a high net worth client's portfolio is far below the agreed benchmarks, the financial advisors discuss to come up with tables with a different blended benchmark that makes it look as if the client's portfolio has performed better than it has. The discussion reveals different views. One financial advisor considers "glossing over" as a common business practice. The other financial advisor, however, is concerned and feels uncomfortable in deceiving the client. He thinks that this behavior is in conflict with the moral standard of the organization to be honest to the clients. He therefore suggests finding other ways to solve the problem with the client.
The relevance of MS seems obvious: no moral problem can be addressed and solved, and no moral standard can be implemented, until it has been recognized and considered as important in the specific situation (Clarkeburn, 2002;Rest, 1986;Sparks & Hunt, 1998). Would the financial advisors fail to notice that the situation contains a moral issuein that case, the risk of violating honesty by misrepresentationthe decision process would be different and deceiving the client would be likely (Butterfield et al., 2000).
As a consequence, MS has been acknowledged as playing a significant role in many professional fields, such as nursing, medicine, and business (see Jordan, 2007;Miller et al., 2014;Weaver, 2007). However, because reality is complex and ambiguous, moral aspects are rarely obvious in daily life, MS is neither a self-evident nor a stable attribute, but rather an ability that can be shaped by experience and socialization. Indeed, past research suggests that individuals largely differ in their responsiveness to moral issues (Jordan, 2009;Reynolds, 2008;Schmocker et al., 2019). For example, whereas some individuals are highly responsive to examples of unfairness or harm to others, and rapidly "see" that a moral code of conduct may be violated in a particular situation, others are more likely to miss or to be "blind" to such transgressions (Gioia, 1992;Haidt, 2003;Jordan, 2009;Schmocker et al., 2019;Pedersen, 2009). For instance, prior studies have shown that business managers and bankers were less likely to detect moral-related than business-related issues in ambiguous vignettes than academics or employees of nongovernmental organizations (NGOs) (Jordan, 2009;Schmocker et al., 2019). These findings suggest that expertise in the business domain can dominate one's encoding of information to such an extent that ethically relevant information is suppressed (Gioia, 1992;Jordan, 2009). They also highlight the relevance of socialization and education through which individuals adopt and learn to rely upon schemas, which incorporate or do not incorporate moral dimensions (Gioia, 1992).
Research in the domain of behavioral ethics has identified numerous psychological biases and situational factors that can facilitate ethical fading and, hence, moral blindness (e.g., Bazerman & Tenbrunsel, 2011a, 2011bSezer, Gino, & Bazerman, 2015;Tenbrunsel & Messick, 2004;Zhang, Fletcher, Gino, & Bazerman, 2015). For example, goal setting or being locked in a business frame have been identified as two factors that facilitate moral blindness, especially critical within the business context. Goal setting typically implies a narrow focus on goal-related issues that can blind people to other considerations (such as moral issues) (Barksy, 2008;Ordóñez, Schweitzer, Galinsky, & Bazerman, 2009). In addition, as a result of a rigid business frame, people can be fixed to view reality solely through an "economic lens" while being unable to view the problem from an ethical point of view (Gioia, 1992;Palazzo, Krings, & Hoffrage, 2012;Tenbrunsel & Messick, 1999).
Importantly, several researchers have emphasized that MS is not just a cognitive process but that it also draws on affective reactions. Spontaneous "gut feelings" or morally charged affective responses, such as outrage, anger, guilt or contempt, or simply feeling concerned, may serve as signals to the individual that essential norms have been violated and ethical compromises may not be tolerated (Damasio, 1994;Haidt, 2003;Tetlock, Kristel, Elson, Green, & Lerner, 2000). This view does not imply that cognitive and affective components are distinct. Rather, the assumption is that when a situation or action fits an existing schema of a moral issue affective and cognitive reactions can be triggered (e.g., Smetana, Jambon, & Ball, 2014). Furthermore, MS can also benefit from empathic concerns, including perspective-taking skills (Davis, 1980;Eisenberg & Fabes, 1990;Hoffman, 2000), as they enable an individual to be responsive to the needs of others and to understand the situation from different perspectives (Narvaez, 2010;Tanner & Christen, 2014;Tenbrunsel & Smith-Crowe, 2008). Indeed, a recent study has provided empirical evidence that a sense of caring and relational connection to others (i.e., empathic concerns) is especially likely to facilitate MS (Schmocker, Tanner, Katsarov, & Christen, 2021). In line with these claims, our design will also assess potential changes in affective responses and empathic concerns through playing when testing the effectiveness of our game to improve MS (for the sake of simplicity, we will subsume affective and empathic responses under the term MS-related constructs).

Serious moral games
In recent years, an increasing interest in the creation of SMGs to support ethics education has been observed (Christen et al., 2013;Flanagan & Nissenbaum, 2014;Schrier, 2015Schrier, , 2019. Only a few have focused specifically on the business context. Those examples, however, approach this topic mostly from a cognitive, reasoning perspective with the goal of improving ethical reflection and decisions (Jagger, Siala, & Sloan, 2016). Some illustrative examples of business ethics games of this type are Deepwater (Buck, 2012) or Core Values™ (The Ethics Game, 2014). Moreover, to our knowledge, the effectiveness of such SMGs has rarely been investigated. One exception is the business ethics game Marketing Mayhem, which was designed to help students improve moral awareness and decision-making skills (Jagger et al., 2016). In evaluating the effectiveness of Marketing Mayhem, the authors relied, however, upon examining participants' self-reported perceptions about the game and their own learning improvements rather than assessing individuals' changes in moral awareness and judgment by using independent measures. Despite promising examples and developments in the domain of business ethics education, we conclude that rigorous empirical research providing evidence of the effectiveness of SMGs is still scarce. Furthermore, it is essential to point out that SMGs are rarely centered on supporting the development of MS. We are aware of only one study that successfully developed a game to promote moral sensitivity and knowledge in the context of responsible conduct of research (Melcer et al., 2020).
In this research, we investigate the potential of uFin: The Challenge to promote MS related to business contexts. Our aim is to contribute to filling this gap by conducting an experimental study meeting several requirements of methodologically sound best practices for assessing game-based learning (All, Castellar, & van Looy, 2021). This includes comparing an experimental group (playing our SMG) with a control group (playing another game without ethical content), using pre-and post-test measures of MS to account for pre-existing differences in MS, or assigning the participants randomly to the conditions. Importantly, to examine whether playing the SMG helped to improve MS, we do not build on self-reports but instead, draw on two measures separately assessed. First, we rely on a rigorously tested and validated measure of business-related MS (Schmocker et al., 2019, Schmocker et al., 2021 that was administered prior to and after playing the games. To tackle test-retest effects that may be likely when individuals are faced with the same measure twice in the pre-and post-test phases, we created two parallel blocks of questions (split-half methodology). Second, we draw on game scores assessed during playing. To examine the effectiveness of game-based learning on MS and the role of additional interventions to promote MS, uFin: The Challenge was developed.

UFin: the challenge
This SMG is a point-and-click game that runs on a tablet in German language. 2 In uFin, players face a futuristic setting. In the role of an internal organization-development consultant at an interstellar bank, players' mission is to fly to another planet to identify strategies for the optimization of a planetary investment bank's organization. The players are under time and performance pressure; their job in the company is at risk if they do not work efficiently. Unknown to the players, this subsidiary is struggling with several legal and ethical issues (e.g., accounting fraud, corruption, blackmail, exploitation, mobbing). The question is: Are the players capable of noticing these problems and taking them seriously enough to investigate and report? To succeed in uFin, players do not only have to pay attention to business-related aspects (such as performance and efficiency) but also have to consider moral issues and be responsive to the concerns of others. That is, MS (including empathic and affective responsiveness) is required.
This means that the design decisions, when creating the game, were strongly driven by the aim that uFin should serve as an experimental setup to influence MS. The use of complex graphics and 3D animations were excluded mainly for economic reasons; instead, the look and feel of uFin was designed to bear some resemblance to two-dimensional adventure games. The use of both textual and graphical cues nevertheless allowed for a broad set of stimuli that has the potential to influence MS and that can be manipulated to create a prosocial nudging version of the game. Behavioral options within the game concern interactions with nonplayer characters or objects. During these interactions, players are faced with many decisions. For example, a meaningful interaction with an object concerns a situation where the players are faced with the decision whether to check or not the content of a recycle bin that may contain sensitive information about the accountant. An example of an interaction with a nonplayer character concerns the situation where the players are faced with the request of the boss of the subsidiary not to disclose sensitive information about the firm to the parent company. The players must decide whether or not to comply with this request. Both situations relate to ethical issues of confidentiality or dishonesty.
In response to their decisions, players continuously receive immediate feedback regarding the three dimensions shown in Fig. 1. Players receive 1) moral sensitivity points (MS points), whenever they respond to moral issues (e.g., when they react to unjust behavior, prefer an honest answer over a dishonest one, ask questions about norm violations, care about confidentiality); 2) empathic concern points (EC points), whenever they treat nonplayer characters nicely and respectfully (e.g., when they show interest in the concerns or the well-being of others); and 3) business sensitivity points (BS points), whenever they show interest in business-related issues (such as performance or efficiency). (The points have different names in the game so that players are left somewhat in the dark as to which topic the points refer to). Players lose points when they fail to respond to moral issues, when they treat others impolitely, or when they fail to respond to business issues. The classification of the behavioral options as relating to MS-, EC-and BS-points was performed by all four authors and discussed until mutual agreement was given. Both plausibility and design considerations played a role when determining the points for each single option (e.g., to avoid that a certain category has a systematic bias for generating to high scores compared to the other categories).
Overall, the points serve to numerically represent players' progress and to sensitize them to different value dimensions. Furthermore, as the players proceed through the game, they regularly come across new clues. Their mission is to select the three most important issues from these clues to report to the parent company. The clues are short written statements. Examples of ethically relevant clues are "The head of the department has manipulated the annual financial statements", or "The head of the department has blackmailed the accountant." Examples of ethically irrelevant clues are "15 positions have been vacant for a long time", or "Further major orders are expected this year." Players can only carry up to three clues with them at once: This novel game mechanism forces players to prioritize issues every time they come across a new clue, because they need to give up a former clue to collect a new one. To detect issues (find clues), players can communicate and interact with five nonplayer characters who are located in different rooms of the bank. Each clue presented in the game was evaluated with regard to its ethical relevance. This was done again among all authors until full consensus was reached. Each clue was allocated 1 to 7 points, with higher scores representing higher ethical relevance.
Once players return to earth from their task, they receive an overview of their performance during game playing, which includes three types of game-based information (see Fig. 2), namely: 1) the three clues that they finally selected, 2) the sum of the MS-, EC-and BS-scores accumulated over the course of the game, and 3) their "detection rate", i.e., how much they have determined about the illegal/ethical problems in this subsidiary. The detection rate is the sum of the ethical relevance scores of the finally selected three clues. It can range from 1 to 21. Thus, a detection score of 21 means that the player has found the three most relevant clues (7 points each). The players have the opportunity to play the game again to try out other paths and to potentially achieve better scores.

Rationale of hypotheses 3
The first goal of the current experimental study involves examining whether playing uFin: The Challenge helps to improve MS. In general, serious games are considered to support learning due to at least two key characteristics of games: they offer the possibility of feedback and repetition. Both are known to be strong factors that influence learning. Feedback assists in adjusting and reinforcing behaviors (Moreno & Mayer, 2005;Ritterfeld et al., 2009;Wouters et al., 2013). Repetition, on the other hand, helps to strengthen memory retention and reactions and is essential to building up habits (Hintzman & Block, 1971;Miller, Shenhav, & Ludvig, 2019). Games are also considered to influence learning by increasing individuals' motivation to learn due to the engaging nature of gaming (Westera, 2017;Wouters et al., 2013). Further factors that have been proposed to make playing an educational game intrinsically motivating are experiencing the task as a challenge, awakening curiosity and fantasy, or providing opportunities for making autonomous choices (Malone, 1981;Przybylski, Rigby, & Ryan, 2010).
During the iterative process of game development and piloting, we considered the user's responses to further ameliorate and to develop a game in accordance with the characteristics mentioned above (i.e., feedback, repeated practice, engaging nature, challenges, and autonomous choices). Moreover, we strongly relied on the idea that the development of moral competencies and virtuous behavior requires practice (Narvaez & Lapsley, 2005;Nucci, Narvaez, & Krettenauer, 2014). This has been implemented by integrating exercises into a business context narrative that requires players to engage in interactive activities and to experience what happens in response. We therefore expected that the immersive, interactive learning environment provided by uFin: The Challenge will contribute to developing and enhancing players' competencies with regard to MS (and MS-related constructs, i.e., empathic concerns and affective reactions). Specifically, we hypothesized:

H1. Participants playing the game will yield higher levels of MS (and MS-related constructs) after playing than participants in the control group (playing a commercial game without ethical content).
The second goal of this study involves investigating the impact of two intervention strategies in promoting learning. While there may be many factors that cause enhanced learning of MS, we intend to examine the role of self-reflection in the form of encouraging players to evaluate and justify their own behaviors and interactions with the nonplayer characters to potentially adjust their goals and strategies. In business ethics education settings, a plethora of courses exist that base ethics education mainly on promoting deliberate and reflective thinking while faced with ethical dilemmas. Much of the educational research over the past years has highlighted the importance of self-regulated learning, whereby learners actively monitor and regulate their behavior to construct goals and appropriate strategies (Butler & Winne, 1995;Nicol and Macfarlange-Dick, 2006;Pintrich & Zusho, 2002). Specifically, Hoffman and Spatariu (2008) argue that metacognitive prompts, i.e., externally generated stimuli or instructions that evoke active reflective cognition, represent a central factor to stimulate the use of self-regulation learning. There is much evidence supporting the positive influence of metacognitive prompting in enhancing learning (e.g., Fiorella & Mayer, 2012;Hoffman & Spatariu, 2008Kim, Park, & Baek, 2009;Kleinheksel, 2014;Kramarski & Zeichner, 2001). In the current study, we implemented metacognitive prompting by asking participants to evaluate and justify their decisions and to imagine how some of the nonplayer characters might feel. Since a narrow focus on business frames and a lack of responsiveness to others are often the problem behind moral blindness (Narvaez, 2010;Palazzo et al., 2012;Tenbrunsel & Smith-Crowe, 2008), as argued earlier, the purpose of those instructions was to promote flexible framing, relational connections, and the ability to discern different possibilities of actions (Palazzo et al., 2012). Hence, our hypothesis was as follows: H2. Participants prompted to critically reflect on their own behavior while playing our game will show higher learning effects in MS (and MSrelated constructs) after playing than participants in a control condition.
The second intervention to enhance learning effects that we wished to explore is based on providing situational cues within the game context. This idea is related to numerous studies revealing that moral behavior can be subtly influenced by exposing individuals to external stimuli. For instance, empirical studies have shown that exposure to images of nature (Zelenski, Dopko, & Capaldi, 2015), warm lightning, (Wessolowski, Koenig, Schulte-Markwort, & Barkmann, 2014), ethics-related words (Welsh & Ordóñez, 2014), or family pictures (Wang, Zhong, & Murnighan, 2014) can "prime" prosocial mindsets and thereby ethical behaviors. Such cues are likely to influence ethical behavior through the automatic activation of moral schemas. In contrast, merely exposure to money (Vohs, Mead, & Goode, 2006), business objects (such as a briefcase, boardroom tables) (Kay, Wheeler, Bargh, & Ross, 2004), or symbols of wealth (Gino & Pierce, 2009;Wang et al., 2014) were shown to evoke a calculative, competitive mindset and to render unethical behavior more likely. This research is also related to the term nudging, referring to strategies to subtly steer people's decision-making by means of situational cues (Thaler & Sunstein, 2009). Thus, we believe that such cues may facilitate or hinder the development of MS.
To examine the influence of various situational settings on learning, we have developed two versions of uFin: The Challenge. The standard version has several stimuli embedded within the game setting, which are, relying on past research, expected to evoke a calculative mindset. In contrast, the prosocial version has stimuli embedded that are expected to evoke a prosocial, caring mindset. We hypothesized: H3. Participants playing the prosocial version of the game (evoking a prosocial mindset) will yield higher levels of MS (and MS-related constructs) after playing than participants playing the standard version of the game (evoking a calculative mindset).
Specifically, for the prosocial version of the game, we have redesigned many visuals and text-based descriptions in the game setting, with the idea of increasing the likelihood of evoking a prosocial mindset. Fig. 3 shows one scene of the game in both the standard (calculative) and the adapted prosocial version. The following three differences are clearly visible: 1) The bull and bear sculpture, typically representing the stock market cycle, was replaced with blooming plants in the prosocial version. 2) The firm logo was Fig. 2. An example of a result report presented to players at the end of the game. It indicates the selected clues (left), the overall collected MS points (symbolized by the scale), empathic concern points (symbolized by the heart), and business sensitivity points (symbolized by the dollar sign) (middle), as well as the detection score reflecting how much they discovered about the illegal/ethical problems in the subsidiary (right). adjusted in the prosocial version. The rising jagline was surrogated with helping hands. Additionally, 3) the strict uniformity of the nonplayer characters was replaced with light-colored dresses. Beyond that, other changes included room lighting (cold vs. warm lightning) or pictures (a picture with the best-performing employee versus a picture with the greatest team).
Prior to the experimental study, we ran a pilot study (N = 49) to verify whether all scenes of the calculative and prosocial version were perceived differently. Participants were asked to rate the extent to which they judged the scenes in the calculative and prosocial version as cold versus warm and calculative versus prosocial on bipolar scales ranging from −3 to +3. The results confirmed that the scenes of the prosocial version were overall clearly perceived as warmer (M calc = −0.82, M prosoc = 1.50, p < .001) and more prosocial (M calc = −0.85, M prosoc = 1.46, p < .001).

Participants
Participants were recruited from a Swiss and a German university (N = 345). The mean age of the participants was 23 years (SD = 4.58). 4 Of this sample, 56.5% were female, 40.6% were economic students, and the rest were students from other fields. Furthermore, 53% reported working (full-or part-time), while the rest did not work. Sixty percent of the participants reported having already taken part in ethics training.

Experimental design
This study consisted of three experimental groups (Groups 1-3) playing uFin: The Challenge and one control group (Group 4) playing a commercial game without ethical content (Rollercoaster Tycoon, produced by Atari). To allocate the participants to the experimental groups and the control group, we made use of an unequal randomization ratio. Such an allocation procedure is highly advocated when being interested in gaining more data from the experimental groups than the control group, without losing statistical power (Dumville, Hahn, Miles, & Torgerson, 2006). Clearly, we were interested in gaining more experience with uFin: The Challenge than the commercial game. Therefore, our participants were upon registration randomly assigned to one of four groups, using a randomization ratio of 3:2 (experimental groups versus control group). Importantly, these groups did not significantly differ in terms of sociodemographic characteristics.
For all participants, the experimental design followed a four-step procedure including a pre-test phase (Step 1), two training phases (Steps 2 and 3), and a post-test phase (Step 4). Fig. 4 provides an overview of the research design and sequence of the events. Overall, the study took place over two weeks.
Game (our game vs. commercial game), self-reflection (yes vs. no) and situational stimuli (standard vs. prosocial version of the game) served as the independent variables. We used two sets of dependent variables: (1) measures of MS and MS-related constructs (empathic concerns, affective reactions) stemming from questionnaire data, and (2) MS, EC, and BS points as well as the detection rate (stemming from game data).

Measures of dependent variables
Following measures of MS and MS-related constructs based on questionnaire data were used. Moral sensitivity. To assess this variable, we adopted the approach by Schmocker et al. (2021). Participants were provided with four vignettes, each describing an ethical problem within a fictitious company. After reading each vignette, participants were asked to rate the importance of considering each of eight statements on a scale from 1 (not important at all) to 7 (very important). While half of those statements were designed to assess the sensitivity for business-related values (but not relevant in this study), the other half of statements were designed to assess the sensitivity for moral values (e.g., "How important do you find it to consider whether the pending decision could disadvantage other applicants?"). The average score of those four statements (across all vignettes) was used to assess MS (α = 0.71). Past research has demonstrated good evidence of measure's reliability and validity (for a detailed description, see Schmocker et al., 2021).
Empathic concern and affective responses. Again drawing on Schmocker et al. (2021), two additional scales were provided after each vignette. To assess empathic concern, we used three items from the German Saarbrucken Personality Questionnaire (Paulus, 2009), which were adapted to the content of the vignette. For example, participants were asked to respond to an item, "Seeing how someone might be hired because of personal contacts, makes me want to protect other applicants", on a scale from 1 (does not apply at all) to 7 (applies very much) (α = 0.83). To assess affective responses, participants rated the extent to which they would judge a given decision of the company as outrageous or shameful (Tanner, Ryf, & Hanselmann, 2009;Tetlock et al., 2000). For example, participants were asked to rate the statement "To what extent would you judge it as outrageous if the client's son had been chosen for the position?" on a scale from 1 (not at all) to 7 (very much) (α = 0.88).
Finally, the following four measures from the game were included: moral sensitivity (MS points; i.e., the extent to which players were alert to moral issues), empathic concern (EC points; i.e., the extent to which players were nice and respectful to others in the game), business sensitivity (BS points; i.e., the extent to which players were alert to business-related issues), and the detection score (i.e., the extent to which players discovered the illegal/ethical problems in the subsidiary). Step 2 followed 4-7 days after Step 1.
Step 4 took place immediately after Step 3.

Pre-test phase
In Step 1, all participants completed an online questionnaire to assess MS, affective and empathic responses (= MS-related constructs), sociodemographic characteristics and other concepts to mask the moral content. 5

Training phase
During this phase, participants came to our laboratory twice, with a time gap of at least four days between Step 2 and Step 3. Depending on the randomization, participants either played the SMG (Group 1-2 = standard version; Group 3 = prosocial version) or the control game (commercial game without ethical content) (see Fig. 1). Between the two training sessions, all participants were provided with a set of questions. The questions varied in content, depending on the experimental condition, but were of equal length to allow for about the same time of exposure, and thus, to mask the different conditions among the participants. The metacognitive prompt group was provided with questions intended to broaden one's own view, elicit empathic concern and perspective-taking, and potentially induce goal adjustments. The other groups were asked to evaluate the design of the game.
We had two reasons to include two training sessions. Obviously, a second session was required to examine self-reflection effects at all. The other reason was that we wished to support learning and habit change by repeating the session (see Wouters et al., 2013, for showing that more sessions help to increase learning effects).
Group 1 (n = 87) played the standard version of the SMG in both training sessions. Between the two training sessions, these participants were provided with nine questions referring to the game design (e.g., "How well do you like the design of the game?"; "How did you like the point-and-click mode in the game?"; "How well were you able to follow the game history?"). Upon arrival at the second training session (Step 3), right before replaying, participants were provided with the individual responses they had given to the game design questions and asked to write down three goals as to which game criteria they would pay more attention to now.
Group 2 (n = 104) also played the standard version of the SMG twice. Between the two training sessions, these participants were provided with metacognitive prompts. Nine questions targeted self-evaluation, flexible framing, and empathic concern (e.g., "In retrospect, which of your decisions during the game were you least happy with? Why?"; "How would others possibly judge your decisions in the game?"; "How might the office manager feel about her role within the organization?"). Upon arrival at the second training session (Step 3), participants were provided with their answers to the metacognitive prompts and asked to write down three goals that reflect how they want to adjust their behavior when replaying the game.
Group 3 (n = 89) played the prosocial version of our game twice, exposing participants to prosocial nudges. Otherwise, the procedure for Group 3 was identical to that of Group 1. Furthermore, between the training sessions, they also received questions about the game design.
Finally, Group 4 (n = 65), representing the control group, played the control game twice. Between the two training sessions, participants were asked to respond to the same game-related design questions as those of the participants of Groups 1 and 3.

Post-test phase
Immediately after the second training session, participants completed an online questionnaire to assess MS and MS-related constructs again (Step 4). To avoid test-retest effects, two parallel blocks of questions (split-half methodology) were used. In the pretest phase, roughly half of the participants were provided with the questions of Block 1, while the rest were provided with the questions of Block 2. In the post-test phase, the two blocks were exchanged.

Procedure
At the beginning (Step 1), all participants were asked to fill in the pre-test questionnaire (online) to determine the participant's baseline level of MS, affective responses, and empathic concern. This was done 4-7 days prior to the first training phase (Step 2). Participants then scheduled an appointment and visited our laboratory for this first training phase. Each participant was seated in an individual cubicle. Upon arrival, participants were familiarized with the course of the session and instructed on how to use the computer tablet. They then played either the uFin: The Challenge or the control game for approximately 60 min. Afterward, all participants were provided with a set of questions (self-reflection or game design questions). Four to seven days later, the participants returned to the laboratory for the second training session (Step 3) to play again for approximately 60 min. Just prior to this second training, participants were provided with the same questions, including their own responses, and asked to write down three goals to pursue in the next trial. Immediately after the second training session, participants were asked to complete the post-test questionnaire (Step 4) to determine their potential advances in MS. Finally, the participants were debriefed and compensated with CHF 80 (Switzerland) or € 40 (Germany) for their full, roughly 4-h-long participation.

Analyses building on questionnaire data
As elaborated earlier, two parallel blocks of vignettes and MS questions were created to avoid test/retest effects. Importantly, t-tests comparing the two blocks within the pre-and the post-test phases did not reveal significant differences for any of the MS and MSrelated measures (ps > .05), indicating that the MS scores, affective responses, and empathic concern did not differ between the two blocks. Therefore, for both Step 1 and Step 4, we continued by averaging across Blocks 1 and 2. To test the hypotheses, we ran a multivariate analysis of variance with covariates (MANCOVA) to account for intercorrelations among the dependent variables and to control for covariates. Gender (male/female), field of study (economics/other fields), employment (yes/no), and previous ethics training (yes/no) were included as covariates. For each dependent variable (MS, empathic and affective responses), we calculated difference scores. That is, the pre-test scores (Step 1) were subtracted from the corresponding post-test scores (Step 4). Thus, a positive delta indicates a positive change in those scores. Table 2 (left half) reports the results for each of the dependent variables obtained in Step 1 and Step 4 and for each of the four experimental groups.

General game effectiveness
To examine the effectiveness of the SMG to improve MS and MS-related constructs relative to the control game, we compared the pre-and post-test scores of the dependent variables from Group 1 and Group 4 (H1). The MANCOVA revealed a significant difference between playing the SMG and the control game (Wilks' Lambda = 0.90, F(3, 144) = 5.74, p = .001, η p 2 = 0.11), indicating a medium effect. None of the covariates was significant. Follow-up analyses showed, as expected, significant differences with regard to all three dependent variables. As seen in Table 2 The results indicate small to medium effects. Thus, H1 tends to be confirmed.

Role of self-reflection
To examine whether encouraging participants to critically reflect on their behavior may facilitate the development of MS and MSrelated constructs (H2), we compared data from Group 1 (game design questions) and Group 2 (metacognitive prompts). The MANCOVA revealed a significant difference between Group 1 and Group 2 (Wilks' Lambda = 0.90, F(3, 183) = 6.55, p < .001, η p 2 = 0.10), indicating a medium effect. Again, none of the covariates was significant. Strikingly, the group that received metacognitive prompts was more likely to decrease rather than to increase MS. Follow-up tests revealed significant negative changes for MS (F(1, 185) = 6.12, p = .014, η p 2 = 0.03) and affective reactions (F(1, 185) = 15.90, p < .001, η p 2 = 0.08), indicating a small and medium effect, respectively. These results are in contrast to H2. These somewhat surprising results suggest that encouraging people to reflect on their decisions and their interactions with nonplayer characters hindered rather than facilitated MS.

Role of prosocial stimuli
To examine whether embedding prosocial stimuli within the game setting will contribute to positive changes in MS and MS-related constructs (H3), we compared data from Group 1 (playing the standard game version) and Group 3 (playing the prosocial game version). We found a significant main effect of game versions (Wilks' Lambda = 0.95, F(3, 168) = 3.10, p = .028, η p 2 = 0.05). Again, none of the covariates was significant. The follow-up test revealed a significant effect for game versions on affective reaction (F(1, 170) = 9.05, p = .003, η p 2 = 0.05). However, in contrast to our expectations, participants playing the standard version (Group 1), not those playing the prosocial version (Group 3), revealed a higher positive change in affective reactions. Thus, these results do not support H3.

Table 1
Means (standard deviations) and intercorrelations between dependent variables.

Analyses with game data
We next conducted analyses with participants' game data, especially with the MS, EC, and BS points, as well as the detection rate (kind of performance score) that participants accumulated until the end of the game. Unfortunately, due to a technical problem (data of players who did not terminate the game according to our instructions was not recorded), we could not include all (of the 280) participants playing uFin: The Challenge in the complete data set (N = 176). Table 1 (lower half) presents descriptive statistics and correlations among all dependent variables for Step 2 and Step 3. Again, we built upon difference scores; scores from the first training session (Step 2) were subtracted from the corresponding scores from the second training session (Step 3) (see Table 2, right half). Positive deltas indicate positive changes.
We first tested H2 and H3 and analyzed the effects of the experimentally manipulated intervention conditions (self-reflection and prosocial nudging) on participants' game-based difference scores. Again, we ran MANCOVAs. We found no evidence that either the In a next step (and related to H1), we combined across all groups playing the SMG (Groups 1-3) and checked for relative changes in MS, EC, BS points and detection rates from the first to the second training session (Step 2 and Step 3), computing a MANCOVA with repeated measures. None of the covariates was significant. We found an overall significant effect of training (Wilks' Lambda = 0.91, F (4, 168) = 3.99, p = .004, η p 2 = 0.09), reflecting a medium effect size. Follow-up tests mainly revealed a significant change from the first to the second training on moral sensitivity points (F(1, 171) = 6.91, p = .009, η p 2 = 0.04) and detection rate (F(1, 171) = 14.83, p < .001, η p 2 = 0.08), indicating a small and medium effect, respectively. As shown in Table 2 (right half), these changes were positive in all three groups, meaning that players improved over time. However, compared to changes in BS and EC points, the largest improvements were found for the detection rate and MS points. That is, replaying the SMG helped to enhance MS scores and, in turn, the performance (detection rate) more than the other dimensions.

Discussion and conclusions
This research was designed to investigate the role of our SMG and two intervention strategies in enhancing MS. First, in terms of the effectiveness of the SMG, our findings can be summarized as follows: Playing the SMG contributes to the promotion of MS (and some other MS-related constructs) compared to playing a nonethical game (H1). The results revealed positive changes in the pre-and posttest questionnaire measures of MS. Furthermore, playing twice the game revealed a larger enhancement in game-based MS points than in the other dimensions. Clearly, changes in the game data scores may be likely to reflect other influences as well, such as increased experiences with the game itself. It is nevertheless interesting to observe that MS scores revealed more changes from the first to the second training sessions than the other scores. Thus, the results reflect some robustness in the conclusion that uFin: The Challenge facilitates the learning of MS. These results fit well into general serious game research, supporting the view that serious games can have a positive influence on various learning outcomes (e.g., Clark et al., 2016;Wouters et al., 2013Wouters et al., , 2017. Certainly, it is important to note that the game mainly had only small learning effects on MS when comparing the scores of the preand post-test questionnaires or comparing the game data scores of the first and second training sessions. However, it is also important to keep in mind that participants only played the game twice. It may well be that longer or more intensive training (possibly also with further game levels and other stimuli) could further enhance the learning effects and change MS-related habits. As Wouters et al. (2013), based on their meta-analysis of serious games, suggested, multiple training sessions increase learning effects. Our study, however, shows that even small but significant changes in MS (and MS-related constructs) can already be achieved in a rather limited Table 2 Means (standard deviations) of dependent variables at Step 1 and Step 4 for questionnaire data, Step 2 and Step 3 for game data, and the corresponding means of changes. , n Group4 = 65. MS = Moral sensitivity, EC = Empathic concern, AR = Affective reactions, MP = Moral points, EP = Empathy points, BP = Business points; DR = Detection Rate; ΔS4-S1 = Step 4 -Step 1,
time. It is also worth noting that in the control group MS, EC and AR somewhat decreased from Step 1 to Step 4. Recall that this group played Rollercoaster Tycoon, a commercial game which is about building an amusement park. To play this game successfully, obviously, other skills than moral ones are required. Moreover, it is interesting that playing this game actually seems to hinder the development of MS, EC, and AR. Second, regarding the role of metacognitive prompts (H2), at least in this game-play scenario, the results revealed that encouraging participants to actively reflect on their decisions and their interactions decreased MS. This finding is somewhat surprising and inconsistent with prior research indicating that metacognitive prompts have a positive influence on learning outcomes (Fiorella & Mayer, 2012;Hoffman & Spatariu, 2008;Kim et al., 2009;Kleinheksel, 2014;Kramarski & Zeichner, 2001). It is possible that our questions were simply not powerful or suitable enough, or there were simply too many questions (nine questions targeting too many different themes). While, undoubtedly, other types of prompts might be more useful, our study, however, also raises the question whether metacognitive prompting can sometimes lead to counterproductive results by hindering rather than facilitating self-regulation. Such an alternative explanation relates to other interesting empirical research that has demonstrated that demanding conditions-such as being engaged in reflective, and hence deliberative, effortful processes-are likely to interfere with working memory and to deplete energy resources, thereby mitigating self-regulation (for an overview, see Koole, Jostmann, & Baumann, 2012). To our knowledge, educational research has hardly approached self-regulation learning from this perspective. Obviously, future studies will need to explore this issue more thoroughly.
Third, regarding the role of providing prosocial cues within the game to trigger a prosocial mindset, we found no support for H3. Although we pilot tested the implemented stimulus material, the prosocial version of the game evoked no changes in the expected directions. Unexpectedly, the standard version was more likely to evoke an increase in affective reactions than the prosocial version. One tentative explanation may be that those subtle situational nudges were simply too weak to make a difference for the players who were embedded in a complex and dynamic setting. The goal-related schemas may have dominated the way that players interpreted the scenario, including the mission and the description of the bank.
There are certainly many avenues for future research on how the effect sizes of the positive effects on MS may be improved. For example, the game might be more effective, if players slip into the role of an ethics officer, or if they are not subjected to time pressure. Educators may also think of combining the game with other teaching elements. Our attempt to achieve this with metacognitive prompting or prosocial nudging was not successful. But there is still scope for many other didactic techniques and instructions to explore, such as group discussions of the game results, or providing detailed debriefings before replaying. Tannenbaum and Cerasoli (2013) have shown that properly conducted debriefs are highly useful in enhancing learning effects. Players may also benefit differently from the game when playing in small groups while exchanging opinions and discussing alternative ways of behaviors, compared to playing the game alone (Wouters et al., 2013). Designers may think, for instance, of changing and investigating the role of other specific game elements to improve learning, such as implementing alternative feedback mechanisms within the game. Furthermore, since our results are based on data from a sample of relatively young students, future studies are certainly needed to examine whether the effectiveness of the game also applies to other target groups (managers, employees), including broader age ranges, and to examine transfer and long-term effects.
In terms of the game itself, a limitation could be that it takes too long to play it (about 60 min). Thus, playing the game is rather time-consuming. Therefore, in terms of further development of the game, one may think of adapting the game and divide it into many and smaller units. This may bring some advantages. First, in creating smaller but numerous training units, solving each unit is less timeconsuming for trainees. Second, in doing so, individuals receive more opportunities for repeated practice, what may help strengthening new habits.
Overall, this study contributes to filling the knowledge gap on the effectiveness of game-based learning in the moral domain by conducting an experiment that meets high methodological standards, such as comparison conditions, pre-post tests, and randomization, and does not draw on self-reported improvements. Furthermore, this research has broader relevance to the business ethics training literature, since there is hardly any research we know of to demonstrate the effect of SMGs on moral sensitivity. Thus, uFin: The Challenge may be considered by educators in the business ethics domain to be an effective teaching tool for improving MS (and related concepts). However, we encourage educators and researchers to search for and examine others ways on how to further improve the effectiveness of this game and to learn more about the potential of SMGs for promoting MS more generally.

Data reference
Datasets are available in the Open Science Framework repository, https://osf.io/ueq45/files/.