Gamification is Working, but Which One Exactly? Results from an Experiment with Four Game Design Elements

Abstract Current gamification research usually examines several game design elements at the same time, which makes it difficult to distinguish how and to what extent individual game design elements increase motivation. We address this research question by individually examining four game design elements (progress bar, narrative, feedback, and badges) in an online experiment. In addition, combinations of game design elements were tested to gain insight about additive effects on motivation. The study included 505 subjects who answered a maximum of 190 different multiple-choice questions. The subjects were told to answer questions only as long as they enjoyed answering them. The results provide statistically significant motivational gains for all individual game design elements. Interestingly, not all game design elements benefit from a combination in the same way. The results of our study indicate that an increase in motivation through gamification is already possible if only an individual game design element is added.

Gamification is inspired by video games, which possess the ability to have an intrinsic motivating effect by addressing the basic psychological needs for autonomy, competence and relatedness (Ryan et al., 2006). It is therefore reasonable to transfer this motivating effect of video games to other non-playing contexts via gamification (Seaborn & Fels, 2015).
The definition of Deterding et al. (2011) is not the only existing definition about gamification. Another definition that applies to the context of this article defines gamification as the use of "[ … ] game-based mechanics, aesthetics and game thinking to engage people, motivate action, promote learning, and solve problems." (Kapp, 2012, p. 54). Many more exist, but the focus of each definition differs and is shifted toward the general idea of gamification (Muntean, 2011), business benefits (Werbach & Hunter, 2012), user experience and engagement (Dom ınguez et al., 2013) or a path to mastery (Kumar, 2013). Many of these definitions may be more appropriate in individual cases, but the definition of Deterding et al. (2011) seems to be most helpful with regard to making it possible to generalize. A further discussion of the issue of defining gamification can be found in Treiblmaier et al. (2018).
Gamification can also be considered as a technique to create or promote flow condition . In contrast to flow theory, where this condition requires full focus to get into flow (Csikszentmihalyi, 2009), gamification, on the other hand, integrates game design elements in a way that they are perceived both consciously and unconsciously.
Countless apps and online platforms are now using game design elements to make their applications more motivating and retain customers (Kalafato glu, 2020), like for example Google Maps with their local guide system (Deal, 2018). Various studies have already shown that gamification has in general a motivating effect (Dichev & Dicheva, 2017;. Usually, in order to test the effect of gamification in different situations, most studies use a combination of different game design elements simultaneously (Dom ınguez et al., 2013;Sailer et al., 2013). Furthermore, the results achieved by using game design elements can differ in great extent (Koivisto & Hamari, 2019). Still, not every gamification project is successful. Some projects have failed because, e.g., the gamified system was too intrusive, was not aligned to already existing systems or in general was below the initial expectations (Liu et al., 2017). From this point of view, gamification is no guarantee of success.
A possible explanation for these differences could be attributed to an interaction of the different game design elements. This means that individual game design elements could reinforce or neutralize each other in their effect. In order to find out whether and how different game design elements might influence each other, the first step is to look more closely at the effect of the individual game design elements (Dichev & Dicheva, 2017). But how exactly different game design elements work and whether there may be differences between the individual elements in terms of supporting motivation, has so far only been investigated by a few studies (Br€ auer & Mazarakis, 2019;Christy & Fox, 2014;Groening & Binnewies, 2019;Hamari, 2017;Mekler et al., 2017). This is an open research question for many years (Mekler et al., 2013) and it is still not answered sufficiently (Koivisto & Hamari, 2019;Mazarakis, 2021).
This experimental field study addresses this question and examines individual game design elements with regard to their motivating potential. In our study we examine four game design elements (badges, feedback, progress bar and narrative) separately. We conducted an online quiz which was gamified with the game design elements mentioned above, testing them individually and in combination. The term quiz is complex and is defined by Raikar (2021) as "a contest in which participants test what they know by answering questions on one or more topics," which means that a quiz is not necessarily a game. However, at the same time, "the term quiz is a capacious one" because, for example, "it can refer to a single game consisting of just a few questions, or it can refer to a large-scale event involving dozens or hundreds of people." (Raikar, 2021). A non-game example can be considered when students are taking exams in the field of medicine at a university (Dengri et al., 2021) or when completing an online training course's quiz with enterprise gamification (Stanculescu et al., 2016). Furthermore, a quiz can be considered as a game element (Saxegaard & Divitini, 2019), a learning element (Berger et al., 2019) or as an affordance (Koivisto & Hamari, 2019) in gamification. For example, an online quiz to examine the effects of gamification on learning was also used in a study by Sanchez et al. (2020), showing positive effects for shortterm assignments. In general, gamification can be considered as conceptionally close to game design, however at the same time a gamified application does not have to be fun (Landers et al., 2018, p. 317).
The results of the experiment provide new and valuable insight into the individual and combined effects of gamification on motivation. This is contrary to the general approach of obtaining self-reported measures through storyboards (Hallifax et al., 2019), questionnaires or surveys (Koivisto & Hamari, 2019), which are usually easier to conduct but also provide less insight, because these findings are less comparable in contrast to well-executed inferential studies like experiments . To advance gamification research we need controlled experiments in order to gain knowledge on the actual effects of gamification.
After the introduction section we present related work regarding motivating individuals, fundamental challenges in gamification research and a general description of the game design elements we used in the study. The following methods section includes the hypotheses, a detailed explanation of the design of the gamification and the experiment itself. After describing the sample, descriptive and inferential results are given. We conclude this article including sections about discussing the findings, limitations, and outlook for future research. The results of this study are based on a poster presented at the conference "Technology, Mind, and Society (TechMindSociety '18)" (Mazarakis & Br€ auer, 2018). This article is a completely enhanced and extended version of the abstract published.

Motivating individuals
As Ryan and Deci (2000) point out, intrinsically motivated individuals perform an activity because the activity itself is interesting, rather than because of a separate consequence, reward or pressure. However, not every individual is identically intrinsically motivated by an activity, but there are individual and contextual differences in motivation potential. On the other hand, extrinsically motivated individuals perform an activity because of the associated remuneration. Thus, extrinsic motivation is the opposite of intrinsic motivation. Hereby the intrinsic motivation is attributed usually with a higher potential. In addition, extrinsic motivation is necessary, because many activities possess no intrinsic motivation potential, like for example a boring task (Ryan & Deci, 2000).
Reaching and sustaining intrinsic motivation is not an easy task, because many risks threaten it. For example, competition can jeopardize intrinsic motivation (Vansteenkiste & Deci, 2003), whereby personality traits can have an influential character (Epstein & Harackiewicz, 1992;Tauer & Harackiewicz, 1999;Vallerand et al., 1986). In an experiment it was shown that high performance motivated subjects prefer a competitive environment, whereas low performance motivated subjects prefer to avoid competition (Tauer & Harackiewicz, 1999). In addition, the actual performance of the subjects is also influenced. Competition can therefore be a social phenomenon that can both increase and decrease intrinsic motivation (Epstein & Harackiewicz, 1992), so that we need to consider trait and state of individuals. Thus, it becomes clear that the motivation of individuals is an overly complex matter whose investigation faces many challenges.
Gamification research tries to avoid a clear positioning of whether gamification is intrinsic or extrinsic motivation by using the term "affordances" (Jia et al., 2016). In line with the definition of motivation of Ryan et al. (2006), we consider gamification as externally endorsed and therefore extrinsic. Mekler et al. (2013) also assume that while game design elements can create an impact on intrinsic motivation, this effect is extrinsically generated. However, it cannot be ruled out that in some cases an extrinsic motivation may be transformed into an intrinsic motivation.

Fundamental challenges in gamification research
Many experimental studies in the field of gamification have no individual consideration of the used game design elements, but usually all game design elements are applied simultaneously, leaving unanswered the individual effect of game design elements (Mekler et al., 2013). Although it has been shown in many studies that gamification can motivate in various ways Sailer et al., 2013;Seaborn & Fels, 2015), we argue the same way as Mekler et al. (2013) that possible effects of individual elements remain often undetected. This is necessary to provide support for good design choices, which is also acknowledged as a research gap (Koivisto & Hamari, 2019). Still, there are at present only very few experimental studies (Christy & Fox, 2014;Huschens et al., 2019;Landers et al., 2017;Sailer, Hense, Mayr, et al., 2017) that consider the effects of individual game design elements.
Additionally the triad of points, badges and leaderboards (Werbach & Hunter, 2012) are still the most frequently examined game design elements in gamification research (Koivisto & Hamari, 2019). Mekler et al. (2013) could show that points, levels and leaderboards, applied individually in an online image classification process are viable means to influence user behavior. It is therefore not necessary to apply all game design elements together. It is also assumed that individual game elements fulfill different functions or address different basic needs. However, the motivating effect could be different in interaction with other elements (Sailer et al., 2013). Therefore it is important to be aware of the relationship between the different elements.
Another study, which addresses individual game design elements and the combination of different elements, was carried out by Sch€ obel et al. (2016). By conducting a survey, they examined which game design elements and combinations are preferred in learning environments. Most popular was the game design element level followed by points and goals. With regard to the number of combined elements, most subjects indicated that they preferred between 3 and 4 game design elements (Sch€ obel et al., 2016). However, these results should be viewed with caution since they are not based on an experimental study but on a survey.
To get experimental and empirical insight into the individual effects of game design elements, we need to limit our study to a certain number of elements. So, we focus on the game design elements badges, feedback, progress bar and narrative. On the one hand we approach already heavily researched game design elements like badges and feedback (Hamari, 2017;Mekler et al., 2013;Seaborn & Fels, 2015;Zichermann & Cunningham, 2011 we intentionally did not research just the three game design elements points, badges and leaderboards, also known as the PBL triad, as it is already known that they can harm in multiple ways motivation and performance (Mekler et al., 2013(Mekler et al., , 2017, in particular leaderboards as prominent example (Br€ auer & Mazarakis, 2019;Werbach & Hunter, 2012). In addition, Kapp (2014a) states that the "most effective gamification efforts include more than points and badgesthey contain elements of story, challenge and continual feedback [ … ]" (Kapp, 2014a, p. 52).
The analysis of individual and joint game design elements is a current trend in gamification science, because "possible effects of individual elements often remain undetected" (Mazarakis, 2021, p. 283). In addition, most gamification studies either compare absent vs. present gamification and do not control for the number of game design elements (Groening & Binnewies, 2021, p. 3). Furthermore, in their seminal article  show that on average 2.30 game design elements have been used in experiments with a standard deviation of 1.20. We use four game design elements in our study, which is more than one standard deviation above the average number of game design elements typically used in experiments.
The four game design elements examined in this study differ greatly in terms of the complexity of their design. Badges and in particular narratives usually require a lot of effort, because it is important to design them in a way that they appeal to the user. They need a careful design and need to consider usually mainstream taste. On the opposite to that, feedback and progress bars summarize information, which is usually easier to implement. We provide a general overview of the four game design elements (badges, feedback, progress bar and narrative) and will later detail in the procedure section how we realized these elements in our study.

Badges
Badges are virtual artefacts that are visually represented. Taken from the game design element "achievements," they consist of three elements: signifier, completion logic and rewards (Hamari, 2017). They are awarded to the user for completing tasks (Antin & Churchill, 2011). Together with the game design elements points and levels, they are among the most frequently used game design elements in gamification research (Mekler et al., 2013;Werbach & Hunter, 2012). Badges can have different functions depending on how they are designed. So they can be used to create a comparison with others or to challenge oneself (Gibson et al., 2015). With the help of badges it is possible to specify certain target values for the user and to incentive the fulfilment of these (Sailer et al., 2013). Finally, a recent study asked subjects how they perceive badges. The results range from the perception of badges as rewards to social signaling and a function as goal setting to informing and encouraging subjects . These results are partly in line with previous results (Antin & Churchill, 2011). The possibility to share badges can also motivate subjects additionally (Sheffler et al., 2020). Badges can be implemented in different ways. To set a goal, the user is directly informed about which badges they can receive and how to act to get them. Further possibilities of application are badges where it is not known how they are awarded. The users only know that there exist badges, but they do not know what to do to get them. If a badge appears unexpectedly, this can cause surprise as well as joy for the user (Zichermann & Cunningham, 2011). Both approaches (known vs. unknown badges) are possible and reasonable to achieve a successful gamification of an application.
Also the game design element badges was examined in a study by Hamari (2017) individually. In a two-year study, Hamari (2017) was able to prove that users of a gamified version of a sharing platform were significantly more active than users without gamification. Hakulinen et al. (2015) could prove equally positive results on motivation and performance by the introduction of badges (Hakulinen et al., 2015). The authors looked at the effect of badges in an elearning course and compared the behavior of a gamified group with a control group. Another study by Kyewski and Kr€ amer (2018) also addresses the game design element badge. The authors examined the effect of badges in an online course. Unlike the other two studies, no effect on intrinsic motivation was generated by the badges (Kyewski & Kr€ amer, 2018).

Feedback
Feedback is one of the most important game design elements for gamification (Kapp et al., 2014;Zichermann & Cunningham, 2011). In addition, with several different types it can be very valuable for gamification (Kapp, 2014b). A general definition states that feedback "[ … ] is information presented that allows comparison between an actual outcome and a desired outcome" (Mory, 2004, p. 746). Feedback is intended to provide the users with information about their performance or the status of their actions, which makes it possible to change behavior (Kapp, 2012), even just with a simple right/wrong feedback (Mazarakis, 2015), also known as confirmational feedback (Kapp, 2014b), to increase participation. Geelan et al. (2015) used feedback as one of two game design elements in an online learning tool. Based on a survey, a positive influence on the motivation and commitment of the participants during learning could be determined.
Additionally, the motivational construct of feedback plays an important role in the research area of intrinsic and extrinsic motivation. In a comprehensive study, it was shown that feedback alone can increase intrinsic motivation as well as, in combination with perceived autonomy and competence and eventually lead to an increase in performance (Ryan & Deci, 2000). Feedback is also crucial for flow theory. Only through feedback can individuals become aware of whether they are doing the activity in question as intended or not. Immediate and constant feedback can then help to maintain the flow state (Kapp, 2012). We also note that feedback can have different definitions and that it is an overly complex concept (Ramaprasad, 1983). In addition, feedback can be distinguished between intrinsic and extrinsic feedback (Patchan & Puranik, 2016). Very generally speaking, Patchan and Puranik consider intrinsic feedback as feedback that occurs naturally. On the contrary, extrinsic feedback is feedback that is provided by an external source. Extrinsic feedback is also usually regarded just as "feedback" (Patchan & Puranik, 2016, p. 130). However, feedback can be acknowledged as one key element of gamification (Mazarakis, 2015;McGonigal, 2011).
It is also worth noting that feedback can be expected as a complimentary game design element for a quiz. To determine whether this is the case, anonymized responses collected via a form at the end of the experiment are evaluated.

Progress bar
Most video games (and probably also life itself) are about moving on in some way, developing further or achieving certain goals. One of the simplest elements that can be used to visualize this progress is a progress bar (Siemens et al., 2015). Progress bars are also used in numerous online portals and shops to motivate users to complete information on their profile pages or to guide them through the steps of a purchase process. So a subject gets a clear information about, e.g., how much percent of an activity or task has been accomplished and how much approximately is left (Myers, 1985). Unfortunately, this game design element is often neglected, despite its rather easy implementation to a system. So, missing research about progress bars as a game design makes a reliable assessment of its potential rather difficult. Both Dicheva et al. (2015) and S€ umer and Aydı n (2018) argue as well that progress bars are a niche in gamification research compared to other game design elements.
Progress bars can be used to represent objectives in a comprehensible way and also provide subjects a graphical information about the (partial) success of the objective (Sailer et al., 2013). Current studies investigating progress bars (Geelan et al., 2015) do not yet assess the individual effect of this game design element on motivation. Only recently there was an exception to this practice, where a study with 185 subjects in the context of the theory of gamified learning, mostly failed to find statistically significant results for progress bar and a combination of progress bar and badges (Garcia-Marquez & Bauer, 2021).

Narrative
Narrative as a game design element (also known as story or storytelling) is also essential for gamification, in particular when it comes to learning or giving instructions, as it gives meaning to things (Kapp, 2012). A narrative is usually a story that guides a user by providing textual or spoken information. It can be used to understand goals or give meaning to an activity or task.
In addition, a narrative can create or enhance a sense of social or emotional experience (Nah et al., 2013). Narratives can be used to add meaning to other game design elements and encourage users to become more involved (Grobelny et al., 2018). In non-game contexts, narratives can play an important role because they can change the meaning of activities in the real world (Sailer, Hense, Mayr, et al., 2017). In physical education, for example, climbing on equipment and skillful jumping from one place to another can be much more motivating when "the ground is made of lava." When designing a narrative for a non-game context, care should be taken to create a reference to the real situation and to align the competence and meaning of the story with the context (Kapp, 2012).

Method
For this study, a field experiment was conducted with a between-subjects design. Each subject was at the beginning of the experiment randomly assigned to one condition, in which the subject remained for the entire experiment. Subsequently, voluntary information on age and gender was requested before the start of questions. The experiment comprised a total of seven experimental conditions. Besides the control group (CG) which did not consist of any game design element, six experimental groups with individual game design elements and combinations were created. Three of these experimental conditions were each equipped with an individual game design element: feedback (FB), progress bar (PB), and narrative (NR). We did not examine the badge game design element individually because there is a large number of research supporting the positive effect of badges for gamification (Antin & Churchill, 2011;Br€ auer & Mazarakis, 2019;Gibson et al., 2015;Hakulinen et al., 2015;Hamari, 2017;Kyewski & Kr€ amer, 2018;Mazarakis & Br€ auer, 2018, 2020a. A combination of the feedback element and another game design element was chosen for each of three further groups: progress bar and feedback (PB þ FB), narrative and feedback (NR þ FB) and badges and feedback (BA þ FB). This gives us the opportunity to test both individual game design elements and combinations. The assumption when designing the combinations was that the feedback condition would have the weakest effect of the four elements, as this condition is most likely not to be associated as a game element, although it is indeed classified as such in the scientific literature (Geelan et al., 2015;Kapp, 2012Kapp, , 2014aKapp, , 2014bMazarakis, 2015;Zichermann & Cunningham, 2011). An increase in motivation through the combination with one of the other game design elements, if any, would be most likely to be expected here. This also strengthens the validity of the individual elements because now a comparison is possible. Given that the individual groups and combinations have been defined, the corresponding hypotheses are set out in the following section.

Hypotheses
The research question of our experiment is to find out whether the number of questions answered in a quiz can be increased by using, ceteris paribus, different game design elements. Based on stated related work, the following six hypotheses are specified to address this research question.
The hypotheses H1 -H3 assume that a motivating effect can already be achieved by applying only one individual game design element. This assumption so far is not well enough studied in research (Mekler et al., 2013).
Hypothesis 1 (H1): The subjects in the feedback condition answer more questions than the control group (FB > CG) Hypothesis 2 (H2): The subjects in the progress bar condition answer more questions than the control group (PB > CG) Hypothesis 3 (H3): The subjects in the narrative condition answer more questions than the control group (NR > CG) Of further interest is whether additive effects may arise from the combination of several game design elements, which is a common practice in gamification research but often difficult to interpret (Sailer et al., 2013). The hypotheses H4 -H6 will shed light whether and to what extent a combination of several game design elements can efficiently support motivation. Usually the effect of feedback, in particular if it is just a right/wrong feedback, is taken for granted. With our study we can estimate the additional quantitative effect, if there is any, in comparison to the feedback condition alone.
Hypothesis 4 (H4): Subjects with the combination of progress bar and feedback will answer more questions than just subjects in the feedback condition (PB þ FB > FB) Hypothesis 5 (H5): Subjects with the combination of narrative and feedback will answer more questions than just subjects in the feedback condition (NR þ FB > FB) Hypothesis 6 (H6): Subjects with the combination of badge and feedback will answer more questions than just subjects in the feedback condition (BA þ FB > FB) It must be noted that the hypotheses H4, H5, and H6 are relevant to assess the additional gain by combining two game design elements. Previous statistical significance vs. the control group is assessed and necessary to check these three (H4, H5, and H6) hypotheses.

Procedure
An online quiz was developed to experimentally test the hypotheses presented in the previous section. Basically, the quiz is about answering questions about continents, countries, and space. Some studies already used gamified quizzes or tests to investigate the effect of gamification (Bevins & Howard, 2018;Cheong et al., 2013). An advantage of our study design as a between-group experiment is that the motivation of the subjects can be objectively measured and compared and is not based on a subjective self-assessment via a survey.
The quiz consists in total of 190 questions. Each question is provided with four possible answers but only one is correct. The questions of the quiz are divided into nine thematic blocks. Seven blocks focus on ten general questions each about the seven continents (in total 70 questions). Also we asked five questions each about four different countries of the continent (Europe: Germany, Greece, Italy and United Kingdom; Asia: China, India, Japan and Russia; Africa: Egypt, Nigeria, South Africa and Madagascar; North America: Canada, USA, Mexico and Caribbean; South America: Brazil, Ecuador, Argentina and Peru) excluding the two continents Australia and Antarctica (in total 100 questions). The final two sections of the quiz each contain ten questions about the moon and space (in total 20 questions). All questions are displayed in each condition in the same order to ensure comparability between the experimental conditions. An example of a question is shown in Figure 1.
The subjects are not informed of the total number of questions in the quiz in any of the conditions. The basic idea of the quiz is to let the subjects answer questions as long as it is fun for them, in order to measure the effect of the various game design elements on the basis of the number of questions answered.
If a subject wants to finish the quiz, then this is possible by pressing the "Quit"-button. The quiz also is finished by answering all 190 questions. It is not possible to pause the quiz and to resume later. After completing the quiz, the subjects were told how many questions they had answered in total and how many of their answers were correct. This was not communicated to the subjects in advance in any experimental condition, so as not to motivate them in addition to the game design elements for the conduction of the quiz. In addition, the subjects had the opportunity to provide anonymous comments via a form at the end of the experiment.

Design of conditions and gamification
In this section we explain in detail the different conditions and the design of the gamification. Also, we provide examples for the experimental conditions.

Control group
The control group did not include any of the following game design elements (feedback, badges, narrative and progress bar). For clarification, there was no feedback if a question was answered correctly or incorrectly, as this is considered a game design element in this experiment (Kapp, 2012(Kapp, , 2014bMazarakis, 2015;Zichermann & Cunningham, 2011). Using an unmixed control group, it is possible to calculate the actual effect of each game design element. In addition, the intrinsic value of the quiz is made visible.

Feedback
For the implementation of the feedback element a "rightwrong" feedback is chosen. This can be considered as extrinsic feedback because our "right-wrong" feedback is provided by an external source and does not occur naturally (Patchan & Puranik, 2016). After a subject confirms their chosen answer to a question by clicking the "Next question" button, the answer is immediately colored green (correct answer) or red (wrong answer). In addition, if the answer is incorrect, the correct solution is highlighted in green color. Figure 2 gives a visual example of the feedback condition. This is in line with confirmational feedback, considered to be a game design element (Kapp, 2014b;Kapp et al., 2014;Mazarakis, 2015).

Badges
For the experimental condition badges 14 different badges were designed. They can be seen in Table 1.
Eight of the 14 badges were only awarded if a specific question has been answered correctly. One of the badges was implemented as a level badge (bronze/silver/gold), as it is often found in other studies (Hamari, 2013;Kyewski & Kr€ amer, 2018) and games. The remaining badges were unlocked when a particular question was answered correctly or for several correct answers in a row. At first, the badges are only visible as 14 gray circles displayed in a box above the current question. Before the subjects unlock one of the badges, it is not clear to them that they can obtain a badge.  In addition, it is not clear what conditions the subject has to meet to get a badge. This approach was chosen to incentivize only the behavior of answering questions and not the effort to achieve a goal. If we would incentivize the targeted collection of badges through a planned behavior by the subjects, we would be conforming with the ideas of the goal setting theory (Locke, 2001;Locke & Latham, 2002). The goals set by the badges could then in turn lead to subjects working precisely up to a certain question to achieve a certain badge. On the one hand, this behavior would counteract the experimental design pursued in this study, in which the motivation of the subjects is measured based on the questions answered. On the other hand goals are implicit components of other game design elements like achievements and quests (Werbach & Hunter, 2012), which could be perceived by individuals as another game design element which confound randomly with our approach, to research the effect of individual game design elements.
The design of the badges is aligned with the findings of Facey-Shaw et al. (2018). As suggested by the authors, various linear and non-linear badges that may be acquired progressively after achieving specific goals are incorporated in the badge design of our study (Facey-Shaw et al., 2018, p. 538). This is implemented in our experiment by providing badges for each continent (linear) and badges without relation to a continent (non-linear). In addition, we consider for our badge design two aspects as relevant. First, we want to support that subjects can achieve mastery (Abramovich et al., 2013). This is the case for the first six badges in Table 1. Moreover, second, learning goals and reinforced learning shall be supported in an unspecific manner by the 14 badges (Botra et al., 2014;Davidson & Candy, 2016). So, in general, we designed the badges to reflect advancement as milestone badges and as outcome-based badges when answering some specific questions correctly (Facey-Shaw et al., 2018, p. 539).
Finally, we were careful about the visual representation of the badges, although visual appeal may not be significant in terms of success of the gamification, but it still can have an impact on interaction with the gamified system (Facey-Shaw et al., 2018, p. 540). After receiving a badge, one could find out why a badge was awarded by hovering the mouse pointer over the badge. As soon as the mouse hovers over one of the displayed icons, a tooltip field appears, which explains what the respective badge represents. In addition, a pop-up window informs the subjects that they have just received a badge as soon as they unlock the badge.

Narrative
As Keusch and Zhang (2017) assume based on their literature review, when using a narrative for gamification, it should be chosen as appropriate to the topic as possible. Based on this assumption, a story with an alien was chosen to meet the topic "Earth and Space." This alien was sent to Earth by a space commission to check whether planet Earth should give way to a space highway. 1 The subject is now asked to answer questions to convince the alien that the Earth is worth to be saved. Suitable pictures were created for all continents and countries thematized in the questions, which did support the story visually. When switching to the next continent or country, a new image and text were displayed with an introductory commentary by the alien. As an intermediate event a block of questions about the moon was added. The alien tells the subject that the moon is about to be blown up to make room for the construction engines. To prevent this, again questions must be answered. The intermediate event should make the story more engaging. The subject receives no information about the outcome of the story. This is to avoid that subjects participate again in the quiz to change the outcome of the story. Though this would not harm the subject, the data gathered would not be suitable for analysis because the subjects would have some previous knowledge of the questions and maybe also remember the correct answers. This is undesirable for the purpose of our study, to find out the motivational potential of individual and combined game design elements. Kapp (2012) also notes that the integration and design of a narrative does not depend on the end of the story but on the journey through it. Figure 1 shows an example how the narrative looks like, including a picture of the alien.

Progress bar
For the progress bar, a round design in the form of a filling globe is chosen, closely related to the topic of the quiz. The image of the earth is uncovered in 20 steps. The subjects see at the top of the progress bar how many questions they still have to answer before the next piece of the globe is uncovered. The number of questions necessary for the next piece is not increased linearly but varies between 3 and 15 questions. In addition, the design of intervals of different size was also chosen to achieve higher motivation through infrequent progress. In contrast to regular progress, irregular progress has a higher motivation potential in gamification (Zichermann & Cunningham, 2011). Table 2 shows the numbered interval and the necessary number of questions to answer are displayed.
In addition, this procedure helped to hide the total number of questions available in the quiz and so not to provide an additional motivation element, which can be viewed again as goal setting (Locke, 2001;Locke & Latham, 2002). In total, the 190 questions were distributed over 20 pieces of the progress bar. Figure 3 shows a quiz question in the progress bar condition.

Subjects
The experiment was carried out for one month and subjects were recruited online via social networks, such as Facebook and Xing. No remuneration was paid for their participation, nor was any such remuneration advertised. In the end 531 participants took part in the study. Of these, 20 were removed due to duplicate IP addresses and session IDs. These subjects were excluded because it could be assumed that the subjects took part in the experiment a second time. However, the results of further participation cannot be evaluated because, firstly, there is the possibility that the subjects may be assigned randomly to a different experimental condition. This would lead to a confounding of the results  because the respective subject could now recognize that there is more than one experimental condition and thus react differently. Secondly, it can be assumed that an effect on the subject's motivation can occur if the subject already knows the questions in the quiz. So, they could be motivated if they can answer more questions correctly or they could be demotivated if the same questions always appear in the same order. Because the direction of the effect of such events is not reliable estimable, the complete removal of these subjects is the only way to interpret the results objectively. Six further entries were excluded from the analysis due to contradictory information (like for example subjects that stated, that they are 99 years old), as well as additional conspicuous behavior, such as going too fast through the experiment without paying attention to the actual questions. The exculpated subjects were distributed over the seven experimental conditions as follows: 5 CG, 4 FB, 3 PB, 6 PB þ FB, 3 NR, 2 NR þ FB, and 3 BA þ FB. The distribution of these 26 subjects is almost equally distributed without any statistically significant abnormality. This left a total of 505 subjects for subsequent statistical evaluation. The allocation of subjects to the experimental conditions was randomized and each subject stayed in its experimental condition. It was not possible for the subject to change the group. Since all subjects reached the quiz via the same internet URL, the subjects were not aware that there was more than one condition available. At the beginning of the quiz, the subjects were informed about the anonymization of the data and about the fact that a "Quit"-button had to be pressed to send the data to the server. It was also pointed out that questions of the quiz should only be answered as long as it was fun for the individuals. This was also pointed out when advertising the study to ensure that each subject was aware that the quiz did not have to be fully answered. Subsequently, voluntary information on age and gender was requested before the start of questions. After finishing the quiz, the subjects were also given the option to provide an open comment about the study via a form.

Results
Our sample consists of 61% (308) women and 34% (172) men, 5% (25) did not provide a gender statement. The mean age for all subjects is 31.47 years (SD 12.98). The median is 27 years, and the age span starts from 14 to 78 years. The differences in age between women and men were analyzed and are not statistically significant The aim of the study was to compare the motivation of the subjects between the different experimental conditions. The motivation of the subjects was measured by the number of questions answered. First the descriptive results and then the inferential statistical results are presented.

Descriptive results
The distribution of the subjects in each of the experimental conditions is shown in Figures 4, 5. We can observe from these distributions after how many questions in each condition the subjects quit the quiz. Obviously, in all conditions many subjects either quit the quiz early (after about 30 to 50 questions) or they answered all 190 questions, even in the control group.
Exceptionally striking are the two experimental conditions with the progress bar (PB, PB þ FB) and the group with the badges (BA þ FB). Exactly 50.0% of the subjects in the condition with the progress bar and feedback answered all questions, whereas this is the case for 39.3% in the condition with only the progress bar. In the badge and feedback condition it is also remarkable that 41.4% of the subjects answered all 190 questions. It can already be stated that the progress bar and the badges, always in combination with feedback, have motivated best the subjects to answer all questions.
The mean value of answered questions over all conditions is 98.79, with a standard deviation of 74.29. A total of 180 subjects (35.64%) answered all questions in the quiz. Table 3 shows the number of subjects, mean value of answered questions, associated standard deviation (SD), p-value and the number of subjects who answered all questions per condition. On average, the subjects answered 56.37% of the questions correctly. Because the questions were in general not so easy to answer and kind of challenging, it was examined whether the proportion of correctly answered questions might have had an influence on the point at which the quiz was finished by a subject. It can already be stated in advance that because there is no normal distribution of the data, a Spearman correlation was calculated to explore this aspect. The analysis reveals a low correlation without statistical significance for the number of questions answered and the proportion of questions answered correctly by an individual, r s ¼ 0.08, p ¼ 0.062. Thus, it can be assumed that the difficulty of the questions had no significant influence on the motivation of the subjects.
The mean value of unlocked badges was 6.67 (SD 3.68). Four subjects did not unlock any of the badges and four other subjects managed to get 12 out of 14 possible badges. All 14 badges were not obtained by any of the 57 subjects in the corresponding group. Table 1 also shows how often each badge was obtained.
It can already be seen from the descriptive results that the subjects in the control group answered fewer questions on average than in the other experimental conditions. For a more precise and reliable interpretation of the results, the inferential statistical analysis follows.

Inferential results
An analysis of variance is conducted for the statistical evaluation of the results to be able to interpret the mean value comparisons between the different groups. The test for the homogeneity of the variances (Levene test) for the number of answered questions is statistically significant with p ¼ 0.000, whereby the Levene statistic is 7.37. All subsequent results are therefore based on unequal variances and corrected accordingly conservatively. This also leads to crooked values for the degrees of freedom.
The analysis of variance indicates a statistically significant difference between the individual experimental conditions, F (6, 498) ¼ 5.68, p ¼ 0.000. Since the homogeneity of the variances is not given, the results must be corrected by the Welch test (Field, 2009). After the adjustment, the result of the analysis of variance is 6.51, p ¼ 0.000. Due to the statistically significant difference in the number of answered questions between the individual experimental conditions, all hypotheses can now be statistically examined and tested one-tailed.
The comparison of the mean values of the feedback group and the control group (H1) provides a statistically significant result, t (121.01) ¼ 1.70, p ¼ 0.046, D ¼ 0.31. Consequently, H1 can be supported and it can be assumed that the game design element feedback motivates the subjects.
The comparison of the mean value of the progress bar group and the mean value of the control group (H2) also shows a statistically significant result, t (165.26) ¼ 3.57, p ¼ 0.000, D ¼ 0.62. Thus, it can be assumed that the progress bar has a motivating effect because the subjects answered more questions in the progress bar condition, H2 is supported.
The subjects in the condition with the narrative also answered statistically significantly more questions than those of the control condition (H3), t (137.47) ¼ 3.91, p ¼ 0.000, D ¼ 0.69. Consequently, H3 can also be supported and it can be assumed that narratives also have a motivating effect. In summary in this study there is support that each individual game design element has a positive effect on motivation to answer questions in a quiz. Now we examine the results for the combinations of game design elements. It is already obvious from the descriptive results, that the differences between the control group and the experimental conditions with the combinations are significant. Therefore, the presentation of these significances will be omitted and the analysis of the hypotheses H4, H5, and H6 will be performed immediately. A statistically significant difference between the mean values was found for the combination of progress bar and feedback compared to the condition with feedback only (H4), t (133.15) ¼ 2.51, p ¼ 0.007, D ¼ 0.84. Also, the comparison  between the conditions with narrative and feedback and feedback only (H5) also showed a statistically significant difference, t (131.19) ¼ 1.92, p ¼ 0.029, D ¼ 0.70. The third tested combination of badge and feedback also shows a statistically significant difference when comparing the mean value with the condition with feedback only (H6), t (114.58) ¼ 3.03, p ¼ 0.002, r ¼ 0.27. All three conditions that used another game design element in addition to the feedback element provide significant results when comparing the mean values with those of the group that used only the feedback element. Consequently, H4, H5, and H6 can also be supported. Combining two game design elements helped to gain more answered questions than feedback alone.

Discussion
The motivation for this study was to assess how individually applied game design elements affect subjects' motivation. We could show that individual game design elements can be enough, to motivate individuals by finding support for the corresponding hypotheses H1, H2, and H3. This needs to be considered because the creation of two or more additional game design elements is usually associated with additional costs. But we may not neglect that we could show that a combination of two game design elements can support the motivation of subjects even further, as we can support our hypotheses H4, H5, and H6.
It is still also questionable, to be in favor of two or more combined game design elements at the same time, because in comparison the additional mean number of questions answered in our experiment, is not high enough to justify this effort. We can show <1% increase if we compare the narrative with the narrative and feedback condition. For the progress bar vs. the progress bar and feedback condition it is at least a 13% increase. Careful consideration is necessary whether an increase of the motivation justifies the additional effort.
For the implementation of the narrative element, a story was specifically designed to be suitable to the theme of the questions. As the results have shown, the game design element proved its motivating effect. However, it must be strongly assumed that the effectiveness of a narrative depends very much if and how the context of the study is considered adequately (Kapp, 2012). Though context is not a unique aspect to be considered for narrative (Finckenhagen, 2017), it is important to have a story where most stakeholders are pleased. The goals of the gamified system and stakeholder objectives need to be aligned and at the same time one must keep in mind the constraints of the organization or the experimental setting (Richards et al., 2014).
Based on our results in particular the use of the progress bar and badges can be recommended to increase motivation. The results on the motivational effect of badges go hand in hand with most results of previous studies (Hakulinen et al., 2015;Hamari, 2017;Koivisto & Hamari, 2019).

Limitations and outlook
As with any other experiment, our study has some limitations. First, context is very important in gamification research (Finckenhagen, 2017;Richards et al., 2014). Not only gamification in general, but also the effect of the game design elements, depends strongly on the context, systems or in general on the environment in which they are used (Finckenhagen, 2017;Hallifax et al., 2019;Koivisto & Hamari, 2019;Richards et al., 2014). For example, the context is essentially different if we want to gamify online shopping vs. learning. In the first case someone wants to increase the website activity, while in the second case the emphasis is on keeping the learning focus high (Liu et al., 2017). This makes it evident that the results of this study cannot be generalized if the context is not considered. Stakeholders or organizations can differ, which can also influence the effect of gamification. Along the same lines, many activities are not intrinsically interesting and require significant effort and persistence. Maintaining long-term user commitment through gamification is a recognized challenge. Whether the shown effects would last over time in other real-life activities is unclear. And the transfer of a game design element from one context into another context does not necessarily lead to the same motivational experiences. We still believe that our sample with 505 subjects is big enough, to draw certain conclusions from the results, but they can vary depending on the context.
We have also refrained from analyzing the game design element badge individually, as the existing research is noticeably clear in this respect (Antin & Churchill, 2011;Br€ auer & Mazarakis, 2019;Gibson et al., 2015;Hakulinen et al., 2015;Hamari, 2017;Kyewski & Kr€ amer, 2018;Mazarakis & Br€ auer, 2018, 2020a. Nevertheless, it must be noted, although it seems rather unlikely, because no individual testing for badges has been applied, it is theoretically possible that badges could either have had no positive effect in our experiment or maybe a much greater effect, without combination with the feedback game design element. Also, the possibility to share badges via social media should also be of interest for future research, as this might additionally enhance the effect of gamification (Sheffler et al., 2020) through public display or comparison with others. Finally, testing known vs. unknown badges is a promising research question in order to figure out if a combination of badges and goal setting is more successful than badges alone.
Further limitations offer additional potential for future research, e.g., the occurrence of the ceiling effect. When designing the study, it was expected that a quiz with 190 questions would consist of enough questions, so that most of the subjects would quit before reaching the final question. Contrary to this assumption, however, in most groups at least one-third of the subjects answered all questions. This can lead to an impairment of the results, since it is not possible to find out for these subjects at which point the motivation decreased. In future studies, the number of questions in such a quiz should therefore be significantly increased. To counteract the ceiling effect, we suggest for future studies a minimum of 300 questions. Of course, the results are still valid, because they underpin the high potential for motivation, they only limit the possible interpretation range.
A concurrent shortcoming and strength of the experiment is the clear focus of the individual conditions. It was expected that the control group would answer only a few questions of the quiz and that almost no one would reach the end of the 190 questions. The task in the control group was just to answer question after question, without getting any information if the answers are correct or wrong and without the application of any game design element. Still, it looks like that answering a quiz possesses a high intrinsic value for subjects. On average, more than 62 questions were answered and 15% of the subjects in the control group answered all 190 questions. So even without knowing if the questions are answered correctly or not, the subjects did not perceive the task as demotivating. This is a key finding that must be considered when designing future quiz studies.
In scientific literature, a quiz can be a game element, a learning element, or an affordance in gamification (Berger et al., 2019;Koivisto & Hamari, 2019;Saxegaard & Divitini, 2019). It is possible that quiz settings could positively influence the effect of game design elements if we consider a quiz as well as a game design element, as it is assumed that game design elements influence each other . However, we can only assume that this is the case. Nevertheless, even if this would be the case, our findings are still valid by applying a between-subjects and randomized experimental design, including the ceteris paribus clause to exclude additional interactions and interfering factors. Furthermore, because we have a quiz setting in all conditions, we can safely statistically control for this variable.
It is possible that the subjects of our study expected some game design elements as complimentary in a quiz setting. To answer this question, we examined the anonymous comments we received from the subjects at the end of the experiment via a form. Out of 505 subjects, only two subjects would have liked feedback (one in the control group and one in the progress bar conditions), and another subject would have preferred a summary feedback per continent (combined progress bar and feedback condition). As a result, we have no indication that the subjects, in general, expected the existence of certain game design elements.
Another limitation of the study occurs in the design of the progress bar. A main objective of the study was to hide the maximum possible questions available in the study. This has been done to avoid the appearance of goal setting (Locke, 2001;Locke & Latham, 2002). We wanted to avoid subjects trying to answer all the questions in the quiz, so that they do not fulfill an (implicit) obligation. Our research design can be considered as an effort to minimize the effect of goal setting in gamification to observe the effects of our different experimental conditions. However, by using the progress bar, a rough estimate of the total number of questions is possible, which needs to be considered in the interpretation of the results. It is inherent to a progress bar that a quantitative or visual indication of progress is displayed and that it can be concluded or estimated when the maximum number of questions can be reached.
In addition, the experimental implementation of the badges condition is not as perfect as anticipated. Possible criticisms are the same as the limitation concerning the progress bar. While we have aimed not to provide explicit goals and information on how to achieve the badges, we may have unintentionally provided an implicit goal and information. The badges are initially only visible as 14 gray circles placed in a box above the current question. Since it is not conveyed what the subject must accomplish to receive a badge, it is not clear to the subject until the moment of activation that a badge can be achieved for the current activity. However, displaying the 14 gray circles might suggest to some subjects that a complete collection of all badges is possible. This, in turn, could be perceived as an implicit achievable goal, which then would function like the game design element collections (Werbach & Hunter, 2012, p. 80), and could also be seen as the game design element challenge, as, for example, in Friedrich et al. (2020, p. 348). To determine if this affected our study, we examined the anonymous comments received from the subjects at the end of the experiment. Fortunately, the analysis shows that there are no comments on this issue. However, this is, of course, not definitive proof that there may not have been any issues. Still, there is no circumstantial evidence for this issue. To avoid this problem and to ensure that only the effect of the badges is measured, an implementation without gray circles would be preferable in future studies.
As in most gamification studies, the effect of the game design elements was only tested over a short period of time. So far, very few experimental studies with big samples, and not just survey studies (Warmelink et al., 2020;Xi & Hamari, 2019), have been carried out about the long-term effects of gamification, and in addition these provide mixed results (Hamari, 2017;Hanus & Fox, 2015). How the different individual game design elements perform over a longer period remains an open question that should be addressed in follow-up experimental studies.
Although it was not the actual intention, this study was able to show that by adding a second game design element, the motivation of the subjects could be additionally increased. However, only the combination of feedback and another game design element (badges, narrative or progress bar) was examined. In addition, it should be considered that a saturation effect may occur or that the motivating effect turns negative when the number of combined game design elements exceeds a certain point. Future research is needed to systematically analyze interaction effects between the individual game design elements and combinations.
Other game design elements may or may not provide same or different results, e.g., rankings or leaderboards. However, the state of research on other game design elements is not consistent. Huschens et al. (2019) investigated the effects of introducing rankings in a working environment (Huschens et al., 2019). The authors were able to identify both positive and negative effects. On the one hand, it could be shown that leaderboards can have a motivating outcome if they are used as a global standard of comparison and thus stimulate comparative behavior. On the other hand, it could be shown that leaderboards increased the perceived pressure of the subjects. Christy and Fox also examined the game design element leaderboard separately in a virtual reality based study (Christy & Fox, 2014). The authors show how the display of participants of different gender on a leaderboard affects the academic identification and performance of women. Differences were found between leaderboards of either male or female participants. Landers et al. (2017) examine the effect of the game design element in the context of goal setting theory. In this context, they examined how goals set with different difficulty in leaderboards affected the motivation of subjects. And another study with mixed results by Ortiz-Rojas et al. (2019) demonstrates the general need to investigate this individual game design element more closely. We agree with that and suggest well-developed research designs to investigate this and other game design elements.
One assumption made in many gamification studies is that different game design elements have different effects on basic psychological needs (Br€ auer & Mazarakis, 2019;Sailer et al., 2013;. Future research could address the different effects of individual game design elements on these basic psychological needs. This in turn may provide indications as to which game design elements should be combined to address all basic psychological needs and thus achieve the most motivating effect possible, considering the personality type of the subjects. Although our study uses more game design elements than the average experiments analyzed by , still more game design elements might be beneficial (Groening & Binnewies, 2021), leaving aside whether an individual instead of a joint use of game design elements would not also be sufficient (Mazarakis, 2021). This can be seen as a research gap in our present study and thus as a possible outlook for further research. Ideally, it would be a platform where the experimental setting does not vary, but the number and quality of the different game design elements do.
A caveat is useful for understanding the feedback element of the game. In this study, a confirmational feedback was explicitly used as a game design element (Kapp, 2014b;Kapp et al., 2014;Mazarakis, 2015). Although one might assume that "everything is feedback" and, for example, a progress bar is thus also feedback, it is important to consider the scientific distinction. Werbach and Hunter (2012) differentiate three categories of game design elements in gamification: dynamics, mechanics and components (Werbach & Hunter, 2012, p. 78). Feedback is considered as a mechanic (Werbach & Hunter, 2012, p. 79), whereas, e.g., progression is considered as a dynamic (Werbach & Hunter, 2012, p. 78). It is not the purpose of this study to analyze the scientific nuances of the terminology. For the sake of completeness, however, it should be pointed out that it can be discussed whether feedback really has its justification for existence as an individual game design element, although the existing literature strongly assumes that it actually does (Geelan et al., 2015;Kapp, 2012Kapp, , 2014aKapp, , 2014bKapp et al., 2014;Mazarakis, 2015;McGonigal, 2011).
Finally, the implementation of the narrative element can be overly complex compared to other game design elements. Developing an appropriate story that has the potential to motivate the user is a greater challenge than designing points, levels or progress bars (Prestopnik & Tang, 2015). Still, long-term effects might be superior for the game design element narrative, in contrast to easier to implement game design elements like points, badges or leaderboards. More research about the narrative element is necessary.

Conclusion
We analyzed the results of 505 subjects about the individual and combined effects of several different game design elements in a quiz, namely feedback, progress bar, badges, and narrative. The different game design elements vary in complexity, from rather simple ones like feedback to more elaborated ones like narrative. Also the elaborated selection takes into account established gamification elements like badges and feedback and less established elements like progress bar and narrative. We can show in this article that individual game design elements are sufficient to motivate individuals to answer questions in a quiz compared to a control group without gamification. Additionally, the game design elements progress bar and badges, always in combination with feedback, have motivated the subjects best to answer all questions. The findings can help to apply to similar quiz settings. For example, these can be a gamified online training course's quiz within an enterprise (Stanculescu et al., 2016).
This study is an important milestone in the analysis of individual game design elements for gamification by applying a randomized between-group study design. We can back our findings with the experimental setting, where we could analyze each effect individually under the ceteris paribus clause. All results were achieved in the same setting without having to question internal or external validity due to different experimental settings or designs. We address the issue of insufficient empirical evidence of the impact of individual game design elements on the motivation of users. In this aspect, this article addresses an important research gap.
Our study has led to new conclusions in the field of gamification research regarding the effectiveness of individual and combined game design elements and giving support to all six hypotheses. In each condition the assumed effect vs. a control or comparison group was confirmed. The game design elements feedback, progress bar and narrative have in our experiment a significant motivating effect when used individually. In addition, a positive effect of combined game design elements could also be demonstrated, with best results for a combination of badges and feedback. In the individual application of the game design elements, the greatest effect was achieved with the narrative element.

Disclosure statement
No potential conflict of interest was reported by the authors.