Designing gamified rewards to encourage repeated app selection: Effect of reward placement

Designers commonly use gamification to improve the frequency of engagement with apps, but often fail to consider the impact of placement on reward value. As rewards tend to depreciate if delayed (termed temporal discounting ), placing a reward further into the future can significantly affect its ability to motivate behaviour. We examine the most effective placement of gamified rewards so as to reduce discounting and to increase the frequency an application is used. In two online studies, users were asked to choose between fictional budget tracking applications that varied in the placement of either monetary (N = 70) or gamified (N = 70) rewards. In both experiments we found that people more frequently used the application that provided rewards before, rather than after, the task. As predicted by temporal discounting, our work suggests that placing rewards early in the interaction sequence leads to an improvement in the perceived value of that reward, motivating further selection. We discuss the findings in the context of designing effective reward structures to encourage more frequent app engagement.


Introduction
Designers use a variety of features to drive engagement within their applications, such as social supports and motivational prompts (Elbert et al., 2016;Maher et al., 2015), or alarms and reminder notifications (Doherty et al., 2018;Stawarz et al., 2015). Gamification techniques, where game-like rewards are applied to non-game contexts (Deterding et al., 2011), are a popular technique to increase engagement, and have been previously found to significantly increase frequency of use when applied to certain contexts (Johnson et al., 2016;Lewis et al., 2016). Yet, the impact of gamification on engagement is not yet clear cut , as weak experiment design and inconsistent use of psychological theory has hampered clear insights on how to most effectively design gamified applications (Seaborn and Fels, 2015).
While gamification is usually successful in motivating behaviours Lewis et al., 2016), it does not do so consistently. As a result, the majority of gamification research focuses on verifying whether certain gamification techniques are effective in specific contexts (e.g. Mekler et al., 2013;Velez et al., 2018), rather than exploring how certain reward processes impact the effectiveness of gamified rewards. For example, temporal discounting (Ainslie, 1975;van den Bos and McClure, 2013), which describes how the subjective value of a reward is reduced based on the size of the delay experienced before presentation, is consistently shown to be an important factor in mediating the value of rewards in both animal and human studies (Paglieri, 2013;Rosati et al., 2007). Yet, temporal discounting is seldom explored in the gamification literature. As more valuable rewards create stronger motivations to choose a certain option or behaviour (Flaherty and Caprio, 1976;Green et al., 1991;Sarafino, 2004), it is important to understand how temporal discounting may influence reward value so as to optimise the impact of incentives structures when designing gamified rewards.
Our paper contributes empirically-tested, theory-driven guidelines by investigating the most effective placement of gamified rewards in order to encourage further engagement with an application. Frequent app selection behaviours provide further exposure to the app interface, allowing designers to leverage this attention to other parts of the application. According to one study conducted on smartphone users (Oulasvirta et al., 2012), frequent app-checking behaviours can act as a gateway for continued app use. Additionally, becoming more habituated to an app interface has been shown to increase the accuracy and speed of the interaction (Garaialde et al., 2020), which creates a switching cost that may demotivate users from moving to a new app.
By applying temporal discounting to reward placement design we show that, rather than placing rewards after long interactions with the app, giving users rewards directly as the app is opened makes them significantly more likely to return to the app. Critically, this effect was replicated when both money (Study 1 -mirroring other cognitive research on rewards) and a points-based leaderboard (Study 2) were used as rewards. This shows how gamified rewards, like a points-based leaderboard, can be influenced by temporal discounting in a similar manner to financial rewards.
These findings have implications for the design of reward structures in a variety of disciplines, particularly in terms of gamified platforms, as they show that rewarding users as soon as they open an application can significantly increase the likelihood they will open the application again. We argue this reward structure is likely to improve the overall perceived value of selecting the application within the automatic modelfree decision-making system (de Wit and Dickinson, 2009). Although further research is required to confirm the effects are effective in a more applied context, increasing this value is believed to increase the likelihood that users will select the app spontaneously (Kamphorst and Kalis, 2015;Wood and Rnger, 2016), therefore promoting more frequent use. Our results provide a valuable empirical test of theoretical predictions into how reward structures should be used when attempting to motivate users to engage with an app more frequently. We suggest that app designers who wish to use gamified rewards to motivate app use could benefit from placing rewards as close to the start of the interaction as possible, as this may encourage users to open their application more frequently.

Gamification and rewards
Gamification is a popular technique used to improve engagement rates across web applications or services, usually providing users with immediate gratification for common or desirable interactions (Deterding et al., 2011;Lewis et al., 2016;Looyestyn et al., 2017). Gamification, as it is widely applied, works by creating the type of reward structures and regular feedback mechanisms commonly found in games, in an attempt to motivate certain behaviours (Deterding et al., 2011;Hamari et al., 2014). Rewards such as points, levels, badges, quests, and leaderboards are usually paired with other types of visual feedback in order to motivate repeated engagement (Tondello and Nacke, 2018). These techniques have been shown to be successful in motivating users to open an app more frequently, spend longer amounts of time using an app, or in increasing levels of participation (Johnson et al., 2016;Lewis et al., 2016;Looyestyn et al., 2017;Seaborn and Fels, 2015). Gamification techniques are particularly popular in contexts where rewards are delayed, such as education (Barata et al., 2013) and exercise , and attempt to provide rewards in the short term in order to motivate users to stay engaged with the application. They are also very common in the context of increasing productivity of employees of business organisations (Koopmans et al., 2012), in motivating individuals to take part in citizen science activities (Eveleigh et al., 2013;Iacovides et al., 2013), and in other types of research (Lewis et al., 2016;Looyestyn et al., 2017).
While reviews of the literature suggest that gamification is generally effective at increasing engagement, there are often issues that prevent these studies from providing conclusive evidence as to why these effects work and what exactly may be driving their success (Deterding et al., 2011;Hamari et al., 2014;Lewis et al., 2016). Seaborn and Fels (2015) highlight how multiple gamified rewards are often given concurrently, making it difficult to identify the unique contribution of each type of reward or reward technique. In addition, theory is rarely used to guide or explain the design of gamification. Because of this, clear evidence-based, best-practice guidelines on how to structure gamified rewards are hard to find. The current paper aims to provide empirical support for theory-driven guidelines that advise on the most efficient placement of rewards when using gamification as a motivational tool.

How rewards affect choice
As highlighted, rewards form a core component of gamification design. A large portion of theory-based research on how rewards affect choices is framed around dual-process theories (Kahneman, 2003;Metcalfe and Mischel, 1999;Sloman, 1996). These theories divide thinking into two separate but interconnected systems, referred to as the model-free (MF) and the model-based (MB) systems (Daw et al., 2011;Glscher et al., 2010). Other names for these systems include System 1 and 2 by Kahneman (2003), Hot and Cool systems by Metcalfe and Mischel (1999), and Associative and Rule-based systems by Sloman (1996). The MF system supports fast, automatic decision making and is heavily reliant on successful past experience to guide decisions. On the other hand, the MB system supports slower, more conscious, and deliberate decision making, whereby decisions are influenced by a predictive model of possible future actions and their related outcomes.
The MF system is particularly sensitive to the magnitude and timing of rewards, relying almost exclusively on previous experience when making decisions (Wise, 2004). As such, any changes to these variables has a drastic effect on how the MF processes reward information (Kobayashi and Schultz, 2008). The MF system is also frequently described as the default decision-making process, with the MB system only exerting periodic influence when required (Evans, 2007). Because of this, designing an incentive structure that targets the MF's sensitivity to perceived reward magnitude and timing may be an effective strategy to influence a person's default behaviours. The current paper empirically tests whether temporal discounting, known to disproportionately affect MF decision making, mediates the influence of monetary and gamified rewards on participant choice.

Temporal discounting
Choosing the timing of reward delivery is likely to be a critical design decision when using gamified rewards to encourage behaviours. This is because the subjective value of rewards can change based on the length of the delay before presentation (Ainslie, 1975;Myerson and Green, 1995). Both humans and other animals value rewards given immediately much more than rewards that are delayed or given later (Luo et al., 2009;van den Bos and McClure, 2013), with this effect being most pronounced early in the delay and reducing over time to follow a hyperbolic curve (Frederick et al., 2002;Green and Myerson, 2004). Therefore, even small delays early on in the interaction may significantly impact the influence of the reward on decision making. This is particularly important in terms of interactions with apps where gamified rewards are given, as there are currently no studies looking at temporal discounting in this context. The current study thus provides the first step in the merging of the highly theoretical research around temporal discounting with the context of gamification, an area of study that regularly lacks this theoretical focus (Seaborn and Fels, 2015).
The temporal discounting effect is believed to emerge from the MF system's inability to create an association between items that are temporally distant. This is believed to be due to a decreased ability of dopamine neurons to form associations between the action (or cue) and a temporally distant reward (Kable and Glimcher, 2007;. As this association is required for learning to occur, the decrease in dopaminergic activity subsequently reduces the subjective value of the reward (Peters and Bchel, 2009). In contrast, as the MB system does not rely on these learned associations, it is not as heavily impacted by reward delays (McClure et al., 2004). As such, most studies that involve only the prospective evaluation of future scenarios, rather than involving direct experience, found minimal reductions in value even for rewards that are weeks or months away (Kirby and Marakovic, 1995). Current research on temporal discounting usually only involves a simple choice between two paths, selected based on a button push or questionnaire answer (Kable and Glimcher, 2007;Kirby and Marakovic, 1995;McClure et al., 2007). In these instances, temporal discounting is measured from the point at which the simple behaviour is executed fully. And yet, many actions are more complex than simple button presses, and may require a lengthy sequence of actions to be completed before rewards are given. For example, completing a language learning session in an educational app involves multiple steps, including opening the app, choosing the lesson to start, and then completing each individual question or test that's required. Usually a reward is only presented after all these components of the sequence are completed, potentially leading to strong temporal discounting effects. However, as temporal discounting research generally only looks at simple behaviours, it is difficult to directly apply its findings to these more complex behaviours. Therefore, we devised two lab based experiments to explore the influence of reward placement further, particularly in the context of longer sequences of behaviours.

Research rationale
Currently, the common strategy for gamification designers is to present rewards after the user has completed the desired task within the app. And yet, temporal discounting research indicates that this may not be the most effective place to present a reward (Luo et al., 2009;van den Bos and McClure, 2013). Due to the time taken to execute the entire sequence, the motivating effect of the reward may be reduced. In this paper, we explore whether rewarding users upon opening an application provides a stronger incentive to select that application again when compared to the common practice of rewarding at the end of the sequence. To identify this, we ran two online experiments: the first exploring this effect using monetary rewards (Study 1) so as to situate the results within the rest of the decision-making literature (e.g. Cushman and Morris, 2015; Daw et al., 2011), while the second used a points-based leaderboard as the reward (Study 2), providing insights into how temporal discounting affects gamified rewards.

Aims and hypothesis
The aim of the first study was to test the effect of reward placement on app selection frequency. Participants were asked to complete a data logging task with multiple steps, devised to reflect the act of saving a transaction in a budget tracking application (e.g. Money Lover) by categorising a receipt. The task was deemed to be complex enough that it would not be completed too quickly, but mundane enough to not be in itself entertaining. A mundane task makes it more likely that any reinforcing effects measured are coming directly from the rewards, and also is more representative of the contexts where gamification is used (e.g. non-gaming contexts that require an extra motivational boost). Three apps were available in the experiment, each providing a reward at different points: immediately after selecting the app (pre-task placement), directly after the categorisation task itself (post-task placement), or after an artificial buffering delay that followed the task (delay placement). Our hypothesis for study 1 is: (H1) Reward placement will have a statistically significant effect on the selection frequency of each app, such that earlier delivery will improve selection frequency when compared to later delivery.

Participants
Seventy participants (26F, 44M) were recruited from the UK pool of Amazon Mechanical Turk (MTurk) workers. This breakdown is approximately representative of the gender distribution on the platform for workers in the UK (Difallah et al., 2018). MTurk has been shown to hold a more diverse participant pool than usual lab-based studies conducted in universities, both ethnically and socioeconomically, which improves the applicability of results to a larger population (Henrich et al., 2010). The study was conducted according to the British Psychological Society ethics guidelines (The British Psychological Society, 2018), and was cleared by the university's ethics review process for low-risk studies. The mean age of participants was 31.38 years (SD = 9.28 yrs), ranging from 19 to 64. Most participants (80%) reported they had completed at least a bachelor's degree, while the remaining participants either reported only completing secondary school (18%), or none of the above (2%).

Study design
The study used a within-participants design, with reward placement (3 levels: pre-task placement, post-task placement, delay placement) as the independent variable, and selection frequency (total number of times participants selected the app) as the dependent variable.

Materials App Selection Task
Participants were asked to select an application before starting a data logging task. Three apps were available across the experiment, each varying in the placement of the reward (see Section 4.2.4). These apps were selected by each participant based on their respective coloured icons, and all involved the same type of expense categorisation task. As such, each trial involved one app interaction where data logging was performed. These apps were represented within the participants' browser window, but were shown in full-screen mode to better imitate a native application. The app icons were presented in pairs, whereby participants had to decide between two alternatives during the app selection screen. Two apps, rather than three, were presented at a time to ensure that app selection task was as clear and as easy to complete as possible. Presenting options in such a manner has been previously shown to improve data clarity, and produce more clearly defined and stable results (Windsor et al., 1994). All app pairing permutations (pre-task vs post-task; pre-task vs delay; post-task vs delay) were shown 20 times, creating a total of 60 pairs displayed across the experimental session. To ensure that participants did not have any prior familiarity with the app icons, Tibetan symbols that varied in background colour (either blue, green, or pink) were used. These types of symbols have been used previously in decision-making research (e.g. Daw et al., 2011). The icons associated with each reward placement condition were randomised for each participant but remained constant throughout their experimental session. The app chosen between each presented pair was the measure used as the dependent variable (selection frequency) in the analysis. As part of the instructions for the task, participants were told that they would be categorising expense statements for three different companies, each with their own colour-coded application. They were also made aware that the companies could differ when they presented payment in the app, and were asked to choose between the two apps displayed. Finally, they were instructed that payment would be based on the number of expense forms completed, and that the experiment would automatically finish once they've reached the outlined time limit.
Data Logging Task Participants had to match a receipt description to a list of expense codes (see Fig. 1). Each trial consisted of one receipt categorisation. This was done to make the temporal distance between app selection and reward as consistent as possible for each application. Allowing users to categorise multiple receipts in each trial would have introduced a major confound by making the time between reward delivery and action inconsistent across participants. After completing each trial, participants experienced a loading delay of six seconds. There were a total of 66 receipts (termed 'expense statements'), 6 practice and 60 main trials, which were picked in a random order regardless of the app icon chosen. The task was developed to simulate the kind of behaviour commonly executed on expense and budget management data logging apps (e.g. Money Lover), where participants have to log their spending. Numeric codes were included to add difficulty and increase the length of time taken up by the task. The task was designed to be mundane enough that it did not interact with the effect of the rewards. The task was identical for each condition except for the variation of reward placement. Participants were also given the option of exiting a trial by pressing the X button located on the top left of the screen.

Conditions -Reward placement
Each app varied the placement of the reward given when engaging in the data logging task. A reward appeared either after selecting an app and before the logging task (pre-task placement), after completing the logging task (post-task placement), or after a loading delay following the logging task (delay placement). The current experiment used monetary rewards (points leaderboard is explored in Study 2). Monetary rewards are commonly used in experiments because they are considered to be universally reinforcing (e.g. Daw et al., 2011;Otto et al., 2013) and thus are a more stable way of assessing reward influence. They are more effective at promoting behaviours than punishment (Li et al., 2016;, and their inclusion allows us to more easily interpret and compare results across other reward-based research, particularly because it has already been shown to successfully influence the MF system (Cushman and Morris, 2015;Otto et al., 2013). The monetary reward for each trial was represented by a £1 coin (local currency), shown for 2 seconds during every trial to signal payment for that expense form submission. Each coin represented a payment of $0.12 (default MTurk currency) and was paid through the platform. The participants were told that they would accumulate payment for each correct expense form they submitted and would be given the amount accumulated at the end of the study. The aim of these instructions was to incentivise participants to maximise the amount of expense forms they completed and to react positively to each reward. The maximum amount of money any participant could accumulate over the experiment session was $7.92. Although not informed of this until the end of the study, all participants were in fact given $8.00 at the end of the study regardless of performance. The payment rate was based on guidelines for fair MTurk payment practices (Lascau et al., 2019).
The delay condition included an artificial delay of 6 seconds, which included a screen with the text "Loading... Please wait a few seconds." Previous research has shown that adding a delay before a reward is presented reduces the value of that reward (van den Bos and McClure, 2013), yet this effect has not yet been replicated in the context of app interactions. The post-task condition was thus included to ascertain whether this established effect could be replicated in the new experiment setting as it uses the same type of delay as other research (wait time). It also allows us to gain an insight in to whether our experimental paradigm is sensitive enough to explore temporal discounting effects. As this type of delay has been effective in other paradigms (e.g. Hayden, 2016), a lack of difference in selection frequency between this and any other conditions would indicate that the experimental set up was not sensitive enough to detect temporal discounting. As such, this condition was included as a type of control to show delays can significantly impact shorter app interactions. Following pilot testing, a loading delay of 6 seconds was chosen so as to minimise the potential for participants to switch tasks (which commonly occurs with delays longer than 9 seconds; Gould et al., 2015) as well as to allow enough trials to be completed while not tiring the participants.
The post-task condition used a type of delay (execution time) which The three screens shown for each trial starting with A) an app selection screen where participants choose between the two apps presented, B) a data logging task where expense codes are matched to the purchase description, and C) an artificial delay presented as a loading screen before participants were taken back to the app selection screen.
has been less extensively researched. When designing the experiment, it was thus unknown whether this type of delay would produce significant temporal discounting effects. As such, the expense form task was designed to be complicated enough to take approximately 12 seconds, ensuring a sufficient delay to produce an effect.

Procedure
Participants recruited through the Amazon MTurk platform took part in the study using their own device. To participate this device was required to use a physical keyboard as an input mechanism and to meet a minimum screen size of 600x800 px. The experiment site was hosted on a university server and used the jsPsych JS library version 6.1.0 (de Leeuw, 2015). This library has been previously found to record reliable response times when compared to other popular experimental resources (de Leeuw and Motz, 2016;Reimers and Stewart, 2015). The online platform also allows for easy recruitment of a large number of participants in a short time span, greatly reducing time burdens on experimenters and assuring uniformity in the materials presented (Mason and Suri, 2012). Upon selecting the HIT (experiment) on the MTurk platform, participants were given information as to the nature of the study and were asked to give consent to take part. They were then told that experiment entailed choosing from two icons representing fictional web apps, and then completing an expense data logging task. The instructions stated that all apps gave the same amount of reward for each completed trial, but could differ in when that reward would appear. Participants were instructed to make their icon selections based solely on which application they preferred at that time. Before starting the trials, participants completed a demographics questionnaire, which included questions about age, sex, education, and occupation. Participants were then asked to complete a set of practice trials, whereby they were presented twice with each app icon and the data logging task. This ensured that all participants had been consistently exposed to each type Fig. 2. Structure of study 1 conditions. The pre-task condition presents a reward immediately following screen A, the post-task condition does so after screen B, and the delay condition after screen C. Each application represents one condition and always presents the reward for each trial in the same location. of reward placement before they started the experiment trials. Following the structure of similar decision making research (Cushman and Morris, 2015;Daw et al., 2011;Otto et al., 2013), participants were required to use the arrow keys to choose from a pair of apps displayed for two seconds, otherwise the trial would time out. There was no timer when completing the data logging task, and participants had the option to cancel out of any trial if they wished to do so. The location of the screen where each app icon appeared, the colours associated with each condition, and the expense statements were all randomised for each participant. The expense form had to be completed correctly to be submitted, otherwise the participant was locked out of the task for five seconds while the incorrect answer was highlighted to the participant during lockout. Participants were locked out following an error to prevent them from guessing multiple codes in quick succession, or from quickly entering wrong values on purpose to be given the correct answer. Participants were given a time limit to complete the experiment of approximately 30 minutes for the main task, after this point they were no longer presented with trials even if they had not completed all of them. This length of time was tested in pilot studies to ensure that the majority of participants would be able to complete the trials within this time. At the end of the experiment, participants were asked to rank the apps based on preference, to give an account on why they had those preferences, and to give details on any strategies used throughout the task. Participants were then forwarded to a final screen, where they were fully debriefed on the nature and aims of the study.

Data cleaning
A participant's data was included in the analysis if at least 70% of all 60 trials had been completed successfully. This threshold is similar to those used in comparable experiments (e.g. Cushman and Morris, 2015;Otto et al., 2013) and was chosen before data collection. Having completed less that 20% of trials, the data from 11 (5F, 6M) participants was removed. Only four participants did not complete the main task of the study within 30 minutes, meaning they were presented with less than the total of 60 trials. However, since all four participants still had succesfully completed over 70% of trials, their data was still included in the analysis. Additionally, four participants (1F, 3M) were removed due to almost exclusively picking the app on only one side of the screen (over 85%), suggesting they were not basing their decisions on subjective preference for each item. The data cleaning process meant that the data of only 55 of the 70 participants was suitable for analysis 1

App selection analysis
The data was analysed using the Bradley-Terry model (Bradley and Terry, 1952), which calculates a selection score based on how often an item is chosen when paired with other items, while considering the transitive property of the items (the selection scores for each condition are shown graphically in Fig. 4). The probability model ranks items based on maximum likelihood estimates and is recommended for studies where options are presented in pairs (Cattelan, 2012;Yao and Simons, 1999). The technique has been used in previous HCI studies analysing similar types of data (Ashktorab et al., 2019;Serrano et al., 2017). The analysis used version 2-1.0-9 of the BradleyTerry2 R package (Turner and Firth, 2012) and R version 3.5.1 Feather Spray (R Core Team, 2014). The model ranked pre-task placement as the most selected condition (λ = 0.286), followed by post-task placement (λ = -.064), with delay placement being the least selected (λ = -.222). As post-task was the middle item in the ranking and is the standard way rewards are presented in most data logging applications, it was used as the reference category in the pairwise comparisons. Preference for the pre-task placement condition was significantly greater than for the post-task placement condition (Z = 6.86, SE = 0.51, p < 0.001), whereas preference for the delay placement condition was significantly lower than the post-task placement condition (Z = -3.09, SE = 0.51, p = 0.002), supporting our hypothesis 2 To test whether participants changed how they performed the task based on the timing of the rewards being given, a linear mixed effects model was run on the time taken to complete the task and the number of errors. There was no significant difference in time to complete the task for the pre-task (t = -0.76, SE = 378.47, p = 0.447) and delay (t = -1.68, SE = 412.01, p = 0.092) conditions compared to the baseline post-task condition. There was also no significant difference in the number of errors for the pre-task (t = 0.948, SE = 0.237, p = 0.343) and delay (t = -1.768, SE = 0.026, p = 0.077) conditions compared to the post-task condition.
Using the choix package version 0.3.3 (Maystre, 2015) running on python 3.6 and based on the Bradley Terry model selection scores, choice probabilities were calculated for each application. The app with pre-task reward had a 62.4% probability of being chosen when paired against the app with delay placement, and 58.7% chance of being chosen when compared to the app with post-task placement.

Self-Reported app preference
As part of the post experiment questionnaire, participants were asked to rate their preference for each of the apps by placing each app icon into one of three category boxes labelled most preferred, no preference, and least preferred. Overall 52.7% of participants (N=29) rated the pre-task placement app in the most preferred category, with 27.3% (N=15) rating the post-task, and 16.4% (N=9) rating the delay placement apps as their most preferred. In the no preference category, the pre-task app was placed by 32.7% (N=18), the post-task app by 52.7% (N=29), and the delay app by 58.2% (N=32). Lastly, for the least preferred category, the pre-task app was chosen by 14.5% (N=8), the post-task app by 20.0% (N=11), and the delay app by 25.5% (N=14) of participants. The association between preference category and app icon was statistically significant [χ 2 (4, N = 55) = 17.687, p = 0.001], mostly due to the high preference for the pre-task app.

Logging task cancellation
Participants were allowed to cancel out of the logging task by pressing the X icon on the top left of the screen, effectively ending that particular trial. The feature was primarily included to allow participants to skip past any items they found particularly challenging or if any other issues arose. It also allowed us to see whether participants would cancel out of the logging task after being given the reward in the pre-task condition. Across the experiment, only one of the original 70 participants was found to avail of the cancel feature, using it indiscriminately across all of the apps in 81% of all trials. This participant was removed from the analysis as they did not meet the threshold of completing at least 70% of trials. No other participant used the cancel feature. When examining the answers in the post-experiment questionnaire, many participants stated that although they had received a reward, they felt that payment of this reward was not guaranteed if the task was cancelled.

Discussion
This study aimed to identify how reward placement could be used to increase the frequency with which people selected an application. Our results show that people more frequently selected the app that placed rewards early in the interaction (i.e. immediately after selection and before completing the data logging task), compared to those that rewarded users after completing the data logging task, supporting our hypothesis. We propose that the results show a reduction in perceived reward value due to temporal discounting (Ainslie, 1975). By placing the reward immediately following app selection, temporal discounting of the reward appears to be minimised, improving the reward's ability to motivate a user to re-select the app by maximising the reward's value. If the reward appears further in the sequence, its valence is reduced, making users less likely to select the app again. Our work is the first to show that temporal discounting effects operate during short interactions with an application.
The findings also carry implications for the understanding of where in a behavioural sequence a reward (gamified or otherwise) is likely to be most impactful in encouraging more frequent app engagement. According to the research on the effects of rewards on MF processing (de Wit and Dickinson, 2009), actions carry an associated value based on their proximity to rewards, which controls how likely the action is to be repeated in the future. Our findings suggest that, in multi-step sequences, increasing the value of the initial action (i.e. app selection) is significantly more successful at improving frequency of selection than including rewards at any other point in the sequence. Previous work has emphasised how starting an action sequence increases the likelihood that the rest of the sequence is executed, both because of the environmental cues controlling choice (Smaldino and Richerson, 2012) and internal motives related to sunk cost (Arkes and Blumer, 1985). We saw this in our work as people still completed the trial in the pre-task condition even though they had the option to cancel the trial after receiving the reward.
Based on these findings we advise that, if delivering a reward, designers of gamified applications should consider placing that reward early (i.e. directly after app selection) as this may encourage more frequent app use. Nevertheless, since gamified rewards are not limited in number, this does not need to be the only location that a reward may appear. It is possible and even likely that a combination of reward structures are needed to encourage the desired behaviour. Therefore our advice is centred around the need for implementing some early rewards when designing gamified applications. However, the current study has a major limitation in that it uses money rather than more common gamified reward techniques such as points or leaderboards. Unlike money, which is considered to be universally reinforcing (Bijou and Baer, 1966;Latham and Huber, 1991), points given on their own usually are less successful at motivating behaviour. This is because they lack the applied context or purpose that gives them meaning . Leaderboards can supply this meaning, and are generally successful at promoting desired behaviours when paired with points Landers et al., 2015;Mekler et al., 2013). Yet it still unclear if non-monetary gamified rewards such as points and leaderboards would operate in a similar way to monetary rewards. We therefore conducted a further study to explore whether the previous findings extend to contexts where a points-based leaderboard is used as a reward instead. Importantly, using a points-based leaderboard also allows us to observe whether the novel effects seen in study 1 can be replicated in the context of a more ecologically valid gamification mechanic.

Aims and hypothesis
The aim of this study was again to investigate how the placement of a reward within an app interaction affects selection frequency, this time using a points-based leaderboard rather than monetary rewards. The gamification technique used in this experiment involved virtual points that control the ranking of the participant on a leaderboard. Our hypothesis for study 2 is: (H2) Reward placement will have a statistically significant effect on the selection frequency of each app, such that earlier delivery will improve selection frequency when compared to later or no delivery.
In addition, rather than including a delay condition in this study, it was replaced with a no-reward condition. The rationale behind this choice was that we wanted to make sure that the leaderboard and points were reinforcing, which would be evidenced by a lower selection frequency for the no-reward condition compared to the other conditions. Additionally, the first study already provided evidence that the delay condition was the least preferred option, meaning that this condition was no longer necessary.

Participants
A total of 70 participants (21F, 48M, 1 other) agreed to take part in the study and were recruited from the MTurk platform. Those who participated in Study 1 were prevented from taking part in Study 2. This sex distribution was again representative of that on the platform for workers in the UK (Difallah et al., 2018). The study was conducted according to the British Psychological Society ethics guidelines (The British Psychological Society, 2018), and was cleared by the university's ethics review process for low-risk studies. The mean age of the participants was 33.63 (SD = 9.00), ranging from 19 to 56. Most participants (60%) reported they had completed at least a bachelor's degree, while the remaining participants either reported only completing A-levels or Secondary Education (39%), or none of the above (1%).

Study design
Similar to the previous study, the experiment involved a within-participants design with reward placement (3 levels: pre-task placement, post-task placement, no reward placement) as the independent variable, and selection frequency (total number of times all participants selected the app) as the dependent variable.

Materials
The app selection screen and the data logging task remained identical to that used in Study 1 [see Section 4 for details]. Each study used the same icons, colours, and expense items, with the studies only varying how the reward was displayed. Instead of only presenting a large pound coin to represent the reward, a large digital gold coin was shown along with a leaderboard (see Fig. 5). When receiving points in the rewarded conditions, an animation would move the coin into the leaderboard, increasing the participant's score and improving their ranking if their score went above another player. The leaderboard was randomly populated with a uniform distribution of scores between 16 -66 points. This was done to ensure that the average participant would be able to slowly climb the rankings to an above-average position. In this way, the potentially demotivating situation of being kept at the bottom of the leaderboard (Massung et al., 2013) was avoided.

Conditions -Reward placement
Both the pre task placement and post task placement conditions were the same as those used in Study 1 [see Section 4 for details]. As previously mentioned, delay placement was replaced by a no reward condition. In this condition, when completing the trial participants would receive no reward with no coin animation. Although the leaderboard was still visible in this condition, no points were added to their tally. As detailed in Section 4.2.4, the delay condition was included so as to confirm the experiment could measure significant differences in choice based on wait time delay, and was thus able to detect temporal discounting. The key comparison within Study 1 was the difference between pre-and post-task conditions, showing that execution time affects temporal discounting. For the current study however, ascertaining whether the gamified rewards used were reinforcing was more important than including a delay condition, as the viability of the experimental paradigm for measuring temporal discounting was already shown. Importantly, whether points-based rewards would be reinforcing in this experimental paradigm was not clearly known. Additionally, adding a fourth condition would have significantly increased the number of trials due to the increase in the number of permutations needed. As a result, rather than a delayed reward condition, it was deemed critical to include a no-reward condition in this study so as to confirm that the leaderboard was reinforcing to participants.

Procedure
The procedure was similar to that used in study 1 except that the time taken to complete the experiment was reduced. Since the delay placement condition was removed, the task was able to progress more quickly. All participants were instead given 17 minutes to complete the main task, at which point they were no longer presented trials even if they had not completed all 60 of them. This length of time was tested in pilot studies to ensure that the majority of participants would be able to complete the trials within this time. As the study took less than 30 minutes, $6.00 payment was given. The rate of payment was calculated at a $12/hr rate as suggested by research on fair payment for experiment participants on Amazon Mechanical Turk (Lascau et al., 2019).

Data cleaning
The thresholds for removing participant data was the same for study 2, where 70% of the trials had to be completed successfully. Three participants (3M) were removed due to not passing this threshold within the 17-minute task. An additional six participants (3F, 3M) were removed due to a heavy bias towards one side of the screen (> 85%), indicating that their choices were not based on a preference for a specific app as specified in the instructions. In total, out of the original 70 participants, 61 (18F, 42M, 1 other) provided data suitable for analysis following data cleaning procedures 3

App selection analysis
The data was again analysed using a Bradley-Terry model (Bradley and Terry, 1952) which provided estimates (selection scores) of relative preference for each item. The model ranked pre-task (λ = 0.466) as the most preferred level of reward placement, followed by post-task (λ = -.022), with no-reward being the least preferred (λ = -.443). As post-task The count next to the participant location on the leaderboard increased by one every time the coin was shown. When no reward was presented, the leaderboard was still shown without the coin and the count remained the same. The position on the leaderboard changed as the count increased such that the player climbed the leaderboard throughout the experiment.
was the middle item in the ranking and the most common way of delivering points in gamified apps, it was used as the reference category in the pairwise comparisons. Selection frequency for the pre-task placement condition was significantly greater than for the post-task placement condition (Z = 9.746, SE = 0.05, p < 0.001), whereas selection for the no-reward condition was significantly lower (Z = -8.439, SE = 0.05, p = < 0.001). Fig. 8 shows the ranking of each app based on preference estimates with 95% confidence intervals based on quasi-standard errors. Using the choix package, it was calculated that the pre-task app had a 62.0% chance of being chosen when compared to the post-task app, and a 71.3% probability of being chosen when paired against the no-reward app. Additionally, the post-task app was chosen with 60.4% preference when compared to the no-reward app. The comparisons between the no-reward app condition suggests that the points-based leaderboard reward itself was generally reinforcing.
We again tested whether participants changed how they performed the task across conditions, using a linear mixed effects model on the time taken to complete the task and the number of errors. There was no significant difference in time to complete the task for the pre-task placement (t = -1.64, SE = 237.63, p = 0.102) and delay placement (t = -.453, SE = 284.74, p = 0.651) conditions compared to the post-task placement condition. There was also no significant difference in the number of errors for the pre-task placement (t = -.782, SE = 0.021, p = 0.435) and no reward placement (t = -1.646, SE = 0.025, p = 0.100) conditions compared to the post-task placement condition.

Self-Reported app preference
As part of the post-study questionnaire, participants were again asked to rate their stated preference for each app by placing their icon into one of three categories: least preferred, no preference, and most Fig. 6. Structure of study 2 conditions. While the pre-task and post-task conditions present the rewards in the same location as the previous study, they use points connected to a leaderboard instead of money. In addition, the delay condition was exchanged for a condition where no reward is presented. Each application represents one condition and always presents the reward for each trial in the same location, except for the no-reward condition where only the leaderboard is shown. preferred. Out of 52 total participants, 53.8% (N=28) placed the pretask app in the most preferred category, while 40.4% (N=21) did so for the post-task app, and 13.5% (N=7) for the no-reward app. For the no preference category, the breakdown was 34.6% (N=18) for pre-task, 46.2% (N=24) for post-task, and 50.0% (N=26) for no-reward. Lastly, the least preferred category had 11.5% (N=6) for pre-task, 13.5% (N=7) for post-task, and 36.5% (N=19) for no reward. The association between preference category and application was significant [χ 2 (4, N = 61) = 41.72, p < 0.001], with the model showing a heavy preference for the pre-task app.

Logging task cancellation
Participants were again given the ability to cancel a trial by pressing the X button located on the top left side of the screen. Only two participants used this cancellation feature during the experiment. One participant indiscriminately cancelled a total of 86% of trials regardless of reward placement, and so was not included in the analysis, while the other participant only cancelled one item. Therefore, similar to Study 1, it does not appear that participants were incentivised to exploit the reward contingencies by abandoning the trial.

Discussion
Similar to study 1, our results show a significant impact of reward placement on selection frequency. We found that the applications that gave rewards were selected more frequently than the one that did not, showing that participants were motivated by the points-based leaderboard in the study. This gives support to the notion that the leaderboard and points were reinforcing to participants. The reward-producing options were preferred even though the participants were told that placement on the leaderboard did not affect how much they earned for taking part in the study. We also found that, as hypothesised, giving users points after app selection but before completing the data logging task led to significantly more frequent app selection than the other reward placement alternative. This means that temporal discounting applies to gamified rewards, and that platforms should consider reward timing when attempting to motivate users with gamification.

General discussion
Our study aimed to inform app interface design, contributing findings on how reward placement can be used to increase the frequency with which people choose apps. Through the application of insights from temporal discounting theory (Ainslie, 1975;van den Bos and McClure, 2013), we measured whether an app that placed rewards early in the interaction, particularly after app selection, affected the frequency with which people choose that app compared to those that rewarded users after data logging. To test this, we conducted two online studies whereby participants selected from identical apps to complete a budget and expense tracking task, varying only in the location and type of the reward presented. Across both experiments the results showed that participants chose the app that rewarded them early on (i.e. after they had selected the app) more frequently than applications that rewarded them after a longer interaction. Importantly, this was shown both for monetary and for gamified (points-based leaderboard) rewards. This supported both our hypotheses and conforms to the predictions made by temporal discounting research (Ainslie, 1975;Hayden, 2016). In addition, no difference in performance measures such as accuracy and completion time were found, indicating that the reward placement had no immediate adverse effects on task performance.

Designing rewards for increased engagement
Across our experiments, we showed that participants chose the pretask app with greater frequency than the other apps to perform the data logging task, following the predictions of temporal discounting theory (Ainslie, 1975). These findings could be used in any field that uses gamification to motivate behaviour, be it education, health, or commerce-based applications. These results provide clear guidelines for   interface designers who wish to use gamified rewards to motivate app use. In particular, it highlights how focusing rewards on the first crucial step, opening the application, could promote a significant increase in the frequency of app selection. Currently the design of most popular incentive structures for gamified apps tends towards rewarding the user after the task has been completed (e.g. Duolingo or Epic Win). However, the approach of rewarding early in the interaction is gaining traction in terms of gaming apps, as daily login rewards are becoming more common. A study looking at the reward mechanisms employed in the 16 most popular mobile games mentioned by the participants found that daily login rewards were included in nearly half of them (Prasad et al., 2020). Our research provides the much needed experimental evidence demonstrating that the approach of rewarding early in the app experience can lead to significant increases in app selection, which may be the reason for the rising popularity in daily login reward structures.
Although not the main aim of the work, our findings also suggests that, if rewarded early, users tend to still complete in-app tasks once they have received a reward. Participants still completed the data logging trial, even in the instances where the reward had already been delivered. This echoes previous work suggesting that encouraging checking behaviours can lead to further app engagement (Oulasvirta et al., 2012). Other work has also shown that because of the sunk cost effect (Arkes and Blumer, 1985) and environmental cues controlling choice (Smaldino and Richerson, 2012), starting an action sequence makes it more likely that the sequence will be completed. Although replication on other populations is needed so as to ensure that this effect is not unique to MTurkers, it suggests that rewarding app selection early may not lead to users abandoning the task after receiving the reward. However, there is still a possibility that early-rewards could be exploited by users. As such, it is important to always carefully consider the entire incentive system and what behaviours are being promoted. Nevertheless, there are numerous techniques that can be used to reduce or prevent exploitation. For example, the early reward could be held conditionally such that it is only presented at the start of a sequence if the previous sequence was completed. With this method, the early reward could only be exploited once, but still provides minimal temporal discounting effects. It may therefore be necessary to balance the benefits of reducing temporal discounting with the drawbacks of decreasing the sunk-cost effect, which may mean the reward may need to be moved slightly depending on which behaviours are causing the most sequence abandonment.

Points and leaderboards as rewards
The current study highlights how early placement of rewards is not only influential when using money as a reward for behaviour, but is also applicable to non-monetary incentives such as a points-based leaderboard. Using leaderboards as a gamification technique to motivate behaviour is already common in education (Caponetto et al., 2014;Sailer and Homner, 2019), business Morschheuser et al., 2015), exercise (Goh and Razikin, 2015;Koivisto and Hamari, 2014), and crowdsourcing (Halan et al., 2010;Massung et al., 2013) domains. Our work echoes research that emphasises the effectiveness of points and leaderboards in promoting changes in behaviour Landers et al., 2015;Mekler et al., 2013). We add to this by showing that consideration for the placement of rewards appears to be an important part of creating an effective reward structure for gamification. In particular, based on our findings, we recommend that points be presented as soon as an action sequence is initiated. Especially when attempting to promote app selection, adding rewards to the start of this sequence may significantly improve the chance that it will be repeated. A crucial component of this is ensuring that the points rewards are seen as reinforcing. Leaderboards are thought to achieve this by giving the points an immediately understandable value, by acting as a metric for social comparison, and by giving users information about the boundaries of performance (Bowey et al., 2015;Garcia et al., 2013). Due to the similarities in findings across our studies, we conclude that, as long as the reward is reinforcing, similar improvements in reward value can be seen when rewarding early for both monetary and non-monetary rewards.

Temporal discounting and app choice
The results provide a significant contribution to temporal discounting research by entrenching it within an experimental context that is less constrained and more representative of complex real-world decision sequences. At present, temporal discounting research mostly uses questionnaires and a set of simple binary choices linked to the attainment of a reward (Kirby and Marakovic, 1996). Participants are also usually asked to imagine either receiving a sum of money now, or in a number of days (Paglieri, 2013), then asked to select their preferred option. This methodology has been criticised by several researchers (Navarick, 2004;Paglieri, 2013), who found the rate of discounting to be orders of magnitude lower in questionnaire studies when compared to comparable experiments where participants experience reward delay. Temporal discounting effects may have been grossly underestimated using these methods, with previous work suggesting that reward delays needed to extend over weeks or months to affect behaviour (Kirby, 2009;Kirby and Marakovic, 1996). Our results show that temporal discounting appears to affect decisions significantly, and in terms of altering the frequency of app selection, only a delay of seconds is required.
According to dual process theories, promoting stronger positive associations in the MF system through rewards increases the chance that the action will be completed spontaneously (Kamphorst and Kalis, 2015;Wood and Rnger, 2016). For the process to occur, stronger associations need to be developed through the repeated pairing of the desired behaviour with a reward (Wood and Neal, 2007). Based previous behaviour-based neuroimaging research (Daw et al., 2011;Kobayashi and Schultz, 2008;Luo et al., 2009;Otto et al., 2013), the increased selection frequency seen for the pre-task reward apps is likely due to stronger positive associations between the reward and action within the MF system (Dayan and Balleine, 2002;Graybiel, 2008). Indeed, the temporal discounting effect seen in study 1 and 2 suggests that the reward mechanisms are influencing the MF system (Kobayashi and Schultz, 2008).

Delineating between decision and action
Current temporal discounting research (Logue and King, 1991;Navarick, 1998;Rosati et al., 2007) usually involves tasks that can be completed immediately, conflating the behaviour with the decision to act, and thus impacting ecological validity when applied to the context of longer interactions. As such, delays due to task-related execution time are currently under examined. The two conducted studies experimentally tested whether temporal discounting applies in a situation where the decision and the interaction are separated by an extended execution time. This was because the participants were first required to select their preferred app, followed by an expense form task that took approximately 10s to complete. The decision to complete the task using a certain app was thus separated from the task itself, allowing for a greater understanding of how temporal discounting affects decision making. Our results support the idea that behavioural instigation (i.e. the decision to act) and execution (i.e. action itself) are separate processes, a distinction that has received some recent support (Phillips and Gardner, 2016;Phillips et al., 2019). Therefore, it may be important to consider rewarding the decision to interact with the app when attempting to promote higher app engagement, which may be achieved by presenting a reward immediately after opening the app.

Limitations and future work
Similar to many lab based decision-making experiments in cognitive science and in HCI (e.g. Ashktorab et al., 2019;Daw et al., 2011;Otto et al., 2013), our study asked participants to make a number of app selection decisions within one experimental session. While this method allowed us to interpret our findings in light of this established literature base, future work should expand on our promising results. This could be done by performing a field trial comparing different app versions that vary in reward placement to examine whether this has a significant influence on login rates. This is important because the significant effects measured in the two studies were for options that were otherwise equivalent. As such, it is possible that for real world scenarios, where options are generally very different, the magnitude of the effect may be reduced when compared to these more controlled settings. Within our study, the absolute probabilities of choosing pre-task placement over post-task placement was found to be 58.7% and 62% for Study 1 and 2, with a relative increase compared to that expected due to chance (50% probability) of 17.4% and 24% for studies 1 and 2 respectively. It is important for future work to see whether this increase in selection frequency creates a meaningful difference when deployed in real-world applications.
Additionally, the studies only involved one experimental session, and while this approach is common in the literature, it means the early reward technique may see reductions in efficacy over time. This is a frequent issue in the gamification literature as strong initial effects usually decrease towards the end of these studies Johnson et al., 2016;Koivisto and Hamari, 2014). Nonetheless, other research has been able to show sustained improvements in engagement (Farzan et al., 2008;Morschheuser et al., 2015), highlighting how differences in how gamification is applied may be behind these decreases in effectiveness. As a result, further research is needed to examine the longer term outcomes of early reward strategies.
Our sample focused on UK based MTurk workers. MTurk workers were used because they tend to improve the applicability of results to a larger population as they tend to be more culturally heterogeneous and socioeconomically diverse than usual university samples (Henrich et al., 2010). However, further work is also needed to explore whether the effects seen in our work apply to non-MTurk samples as well as other nationalities and cultural backgrounds. In addition, as there is an ongoing controversy regarding the effect of rewards on intrinsic motivation (Bright and Penrod, 2009;Cameron et al., 2001;Cameron and Pierce, 1994), researchers may need to examine whether the early reward strategy contributes to the undermining effect (Deci and Ryan, 1985;Ryan and Deci, 2000). Therefore, future work may include direct measures of intrinsic motivation to ascertain if it is affected by changes in reward placement.
Although small, self-reported preference data in Study 1 shows some participants preferred the apps where a reward was delayed, while some participants in Study 2 preferred the app where no reward was given. As most apps gave the same rate of reward, only varying in its placement, some participants may have formed opinions on which app was best based on other factors. These could include the perceived difficulty in categorising the expense items for a particular app, or a general preference for an app's symbol or colour. Even though all of these dimensions were randomised within the experiment across the apps, this type of reasoning was sometimes reflected in some of the answers to the post-experiment questionnaire. As these dimensions were randomised across the experiment, this is not likely to have significantly affected our results.
As preference for earlier rewarding suggests a model-free process, it could be argued that participants had a natural expectation that they needed to collect as many points as possible, and therefore may have been actively using MB processing to guide them towards selecting the reward-producing options. While this is a possible explanation as to why the no-reward placement was the least preferred option within the experiment, it would not explain the preference for pre-task placement. Both the pre-and post-task placements presented the same magnitude and rate of reward, presumably making them equivalent to a MB system. As other research indicates that the MB system does not discount rewards significantly based on a delay of seconds (Green and Myerson, 2004), MF processing is still a more likely explanation for the difference in preference seen in each study.
It may also be the case that some participants were meta-cognitively aware of how the rewards affected their MF system, and thus employed some MB processing to help maximise the rewards with the strongest effect. While MF processing is believed to occur automatically and outside conscious awareness (Otto et al., 2013), there is evidence that collaboration occurs between systems (Balleine and Dezfouli, 2019). The MF system is also believed to signal its desires through impulses and emotions (Gardner, 2015;Lally and Gardner, 2013), making it possible that these feelings can be acknowledged by the MB system. Further work needs to be conducted to disentangle the role of MB and MF processes in reward-based decision making seen in this work.
Although targeting the MF system can benefit the user by reducing system conflict, affecting automatic behaviours through the use of particular cues can be exploited through dark design pattern (Greenberg et al., 2014). Pop-ups exemplify this unethical 'Bait and Switch' behaviour by creating realistic looking windows and dialog boxes (cues) that are changed in their function to bring undesired results (such as opening malicious programs). This type of hijacking of automatic behaviours is also common in phishing scams, where malicious actors present a familiar interface (e.g. PayPal Website) in the hopes that users will provide important personal information (Dhamija et al., 2006). There is a danger that unscrupulous designers may use the insights from these studies to unwillingly motivate users to use apps they are trying to avoid. However, there is evidence to suggest that the development of new MF behaviours needs at least some intention from the user to be effective (Gardner et al., 2020). As a result, we suggest that this technique only be used to reduce system conflict for behaviours the user already has some intention to do. Currently however, there is little discussion about the ethical considerations of targeting the MF system. As such, the potential harms from misappropriation of this research are still unknown.

Conclusion
Gamification is a common technique for incentivising users to engage with an application . Our work sought to identify how to most effectively design reward schedules to promote app selection when using gamified rewards. Consistent with the concept of temporal discounting (Ainslie, 1975), our findings suggest that placing rewards closer to the decision to open the application is more effective at promoting further app selection than placing rewards at the end of longer app interactions. Using such a reward structure may therefore be more effective in encouraging users to return to such applications. Rather than rewarding after longer interactions with the app, which is common in current gamified applications, designers should consider rewarding users early for deciding to interact with the app in the first place.
The work also shows how an established theory from psychology and cognitive science, temporal discounting (Ainslie, 1975), is applicable to gamified rewards, opening the door for other theories to be extended to this context. In addition, it crystallises the importance of using supported theories as a foundation of any gamification research, which has focused too deeply on finding the most effective gamified rewards Seaborn and Fels, 2015), rather than understanding the underlying mechanisms that make these rewards motivating in the first place. By researching this issue further, future researchers may be better able to optimise the value of gamified rewards by making intentional, theory-driven, and targeted changes to key characteristics of the incentive system.

Data availability
The datasets and analyses used for the current research are available in the Open Science Framework Repository at https://osf.io/46xu7/? view_only=ff40fd7b38954f2082db8dd3e77504dd

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.