Do MTurkers Exhibit Myopic Loss Aversion?

We present results from a highly powered online experiment with 937 participants on Amazon Mechanical Turk (MTurk) that examined whether MTurkers exhibit myopic loss aversion (MLA). The experiment consisted of measuring MLA-compliant behavior in two between-subjects treatments that di ered only regarding the risk pro le of the risky asset employed. We found no statistically signi cant evidence of MLA-compliant behavior among MTurkers in both treatments. JEL classi cation: G10 G11 G41


Introduction
The concept of myopic loss aversion (MLA) has originally been introduced by Benartzi and Thaler (1995) as a possible explanation for the equity premium puzzle (Mehra and Prescott, 1985). MLA describes the behavior of individuals to frame decisions narrowly, i.e., to evaluate investments frequently or to segregate them, which is based on mental accounting (Kahneman and Tversky, 1984;Thaler, 1985;Kahneman and Lovallo, 1993;Thaler et al., 1997;Lee and Veld-Merkoulova, 2016), making them more prone to existing loss aversion (Kahneman and Tversky, 1979). Such behavior has been associated with a negative impact on individuals' nancial decision making (Looney and Hardin, 2009). There exists a substantial body of empirical evidence supporting the theory of MLA. In particular, MLA-compliant behavior has been shown among university students in individual decisions (Keren and Wagenaar, 1987;Gneezy and Potters, 1997;Thaler et al., 1997;Bellemare et al., 2005;Langer and Weber, 2005;Fellner and Sutter, 2009) and in experimental market situations (Gneezy et al., 2003). In addition, Sutter (2007) has shown that teams of students as decision makers display MLA. It has further been demonstrated that not only students, but also individuals from the general population ( Van der Heijden et al., 2012), nancial experts (Haigh and List, 2005;Eriksen and Kvaloy, 2010;Larson et al., 2012), and private investors (Wendy and Asri, 2012) behave in accordance with MLA theory. Furthermore, there exists evidence for MLA-compliant behavior in the contexts of retirement savings and insurances (Benartzi and Thaler, 1999;Papon, 2008).
We conducted an online experiment to investigate whether the concept of MLA can be generalized to the behavior of crowd workers on Amazon Mechanical Turk (MTurk), a subject pool that is frequently recruited for social science online experiments (Chandler and Shapiro, 2016). In doing so, we implemented the lottery investment framework established by Gneezy and Potters (1997) -the foundation for the most frequently applied measurement of MLA available (see e.g., Bellemare et al., 2005;Haigh and List, 2005;Fellner and Sutter, 2009) -on MTurk. Crowd workers on Mturk have been shown to produce results similar to those in laboratory experiments (Paolacci et al., 2010;Crump et al., 2013) and to reliably and consistently report characteristics such as demographics and risk preferences that have been found to correlate with actual risk-taking in simple lottery experiments (Johnson and Ryan, 2020).
Given that the predictions of MLA theory do not explicitly di er between various types of mixed gambles as long as they are characterized by a positive expected value (Haisley et al., 2008), as an exploratory extension we also tested whether design changes regarding the risk pro le of the lottery a ect participant behavior. For this reason, in addition to the lottery by Gneezy and Potters (1997), we applied a second lottery based on Charness and Gneezy (2010), which is also characterized by a positive expected value, however, this lottery is more attractive in terms of both the expected value and the probabilities of gaining and loosing. 1 We did not nd evidence of behavior consistent with MLA in MTurkers in either treatment. Thus, we provide results that question the generalizability of the concept of MLA across groups of people. In addition, we found no di erence-in-di erence e ect between the two treatments, suggesting that the di erences in the risk pro les do not statistically signi cantly a ect the impact of varying feedback and decision frequency on participants' risk-taking. Finally, on a more general level, we found higher overall risk-taking for the more attractive lottery.
With this study, we contribute mainly to two strands in the literature. We contribute to the general literature on MLA already discussed. Speci cally, we add by applying the Gneezy and Potters (1997) MLA framework to MTurker's, i.e., a pool of subjects that, to the best of our knowledge, has not yet been investigated in this respect. In doing so, we test the external validity of MLA and, in particular, the systematic generalizability of the concept across groups of people. Secondly, by testing MLA based on lotteries with di erent risk pro les, we contribute to a smaller part of the literature that examines the robustness and universality of MLA with respect to di erences in the characteristics of the underlying risky asset. Beshears et al. (2017) have shown that the behavioral prediction of MLA theory does not necessarily remain accurate once more realistic parameters of risky assets are used, such as actual nancial market data. Haisley et al. (2008) have provided evidence that for mixed gambles with negative expected value, such as state lotteries, broad bracketing, i.e., aggregating the outcomes of multiple games, does not increase risk-taking, but decreases it. However, this is consistent with the notion of MLA implying better outcomes when decisions are considered in a broader frame. Similar results, but when considering lotteries with positive expected value, have been obtained by Weber (2001, 2005). The authors have provided a compelling argument to extend the concept of MLA to the concept of myopic prospect theory (MPT) to also explain non-unidirectional e ects of varying feedback and decision frequency on decisions under risk with positive expected value. In addition, studies that have looked at the causes of MLA-conforming behavior, i.e., feedback and/or decision frequency, have provided mixed results (Bellemare et al., 2005;Langer and Weber, 2008;Fellner and Sutter, 2009). As Zeisberger et al. (2012, p. 46) have aptly put it: "What can be learned from the large body of research on myopia and investment is that there is obviously considerable heterogeneity in individual behavior and minor design issues that had not been considered to be relevant beforehand might have a major impact on the results." In particular, we contribute to this strand by measuring MLA-compliant behavior across decision situations by implementing two mixed gambles based on Gneezy and Potters (1997) and Charness and Gneezy (2010) on MTurk that di er in attractiveness in terms of win and loss probabilities and expected values.

Experimental Design and Procedure
Following the procedure by Gneezy and Potters (1997), the participants had to make a betting decision for each of nine rounds. Speci cally, each participant i had to decide on a value x i ∈ [0, 200] of an initial endowment per round of 200 tokens to bet in a risky lottery. Participants were randomly assigned to one of two groups, i.e., sub-treatment or sub-treatment , which di ered only in terms of feedback and decision frequency. In the sub-treatment, participants chose the amount to bet in the risky lottery in each of the nine rounds and were informed after each round about the outcome of the lottery and their earnings from that round. In contrast, in the sub-treatment, participants were asked to decide on the amount to bet in the lottery in rounds 1, 4, and 7 for three consecutive rounds. Decisions were binding for three rounds, so the amount bet in sub-treatment remained unchanged for three consecutive rounds.
Participants were informed about the outcomes of the lotteries and aggregated earnings only after every third round (i.e., in round 3, the aggregated earnings from rounds 1-3; in round 6, the aggregated earnings from rounds 4-6; and in round 9, the aggregated earnings from rounds 7-9 were shown). According to MLA theory, our prediction was that participants in the sub-treatment will bet higher amounts than participants in the sub-treatment, which is explained by a more advantageous perception of the lotteries when their results are presented in a more aggregated way. This procedure was applied in two between-subjects treatments. Participants were randomly allocated to one of the two treatments, i.e., and . In treatment , we applied the original lottery by Gneezy and Potters (1997) that reads as follows: 2 You have a chance of 2/3 (67%) to lose the amount you bet and a chance of 1/3 (33%) to win two and a half times the amount you bet.
In addition, in another treatment , we introduced the lottery established by Charness and Gneezy (2010) with the following risk pro le: You have a chance of 1/2 (50%) to lose the amount you bet and a chance of 1/2 (50%) to win two and a half times the amount you bet. 3 Therefore, two treatments were obtained that di ered only in terms of the risk pro les of the lotteries employed. Speci cally, the lottery in treatment was characterized by an expected value E(x i ) = 0.17 for x i = 1 and a loss probability pr loss = 67%. The lottery in was characterized by an expected value E(x i ) = 0.75 for x i = 1 and a loss probability pr loss = 50%. Thus, the lottery used in appears to be notably more attractive from an expected utility viewpoint, and we hypothesized that overall risk-taking would be higher in this treatment compared to treatment . However, according to the theory of MLA, we further hypothesized that treatments would not di er with respect to MLA. For a given round t, participant i s earnings π i,t were given as follows: For each treatment, we implemented the two sub-treatments, i.e., and , varying the decision and feedback frequency, as summarized in Table 1.

Lottery
Treatment Sub-treatment Gneezy and Potters (1997) : Charness and Gneezy (2010) : To ensure a valid comparison to Gneezy and Potters (1997), the instructions in our study were virtually equivalent to those in the original paper. Our instructions di ered only with respect to the implementation of the lottery draw, which in our study was performed by a computer. 3 Following Weber (2001, 2005), we calculated whether MPT can explain possible reversed behavioral patterns when participants are confronted with this lottery. Assuming estimated probability weights of γ + : 0.61 and γ − : 0.69 and weighting and value functions by Kahneman and Tversky (1992), for no values of α, β and λ, it follows that S 1 (x) > 0 when S 3 (x) < 0 holds simultaneously, i.e., the myopic value S 1 (x) of this lottery can never be positive if the non-myopic value S 3 (x) of this lottery is not positive at the same time. Thus, MPT would not predict that participants are willing to invest in this lottery in the myopic case ( ) while not being willing to invest in the non-myopic case ( ).
In an exit questionnaire, we asked participants about their general and nancial risk preferences, demographic and socioeconomic characteristics, such as information on age, gender, education, annual gross income, as well as their nancial education and investment experience. 4 We conducted a highly powered trial. Ex-post power analyses showed that our sample size of N = 473 in treatment , and N = 464 in treatment guaranteed that we obtained 80% power to reliably detect a small e ect of Cohen's d = 0.20 with respect to di erences in risk-taking between and in both treatments. Speci cally, in , we achieved a statistical power of approximately 99% to detect 71% of the original standardized e ect size of Cohen's d = 0.63 in Gneezy and Potters (1997). 5 This e ect was related to the measurement of MLA over all nine rounds, which we focused on in this paper.
The experiment was conducted online with 937 US participants on Amazon MTurk. The average age of the participants was 37 years, with 34% of participants being female and 66% of participants being male (see Table A1 for details and further demographic and socioeconomic information). 6 Experimental sessions were held in August and September 2020 and January 2021. The average time participants spent on the experiment was 7.10 minutes (SD: 5.75 minutes). Participants received a at fee of $0.75 plus an average bonus incentive of $1.45 (SD: 0.48) based on their decisions and lottery outcomes. This corresponds to an hourly wage of $18.59 on average. The experiment was programmed using oTree (Chen et al., 2016). 7 3 Results Figure 1 shows the average round bet over nine rounds as a percentage of the initial endowment of 200 tokens for both treatments, i.e., and , and sub-treatments and . To begin our analyses, we rst consider treatment , which was an online replication of Gneezy and Potters (1997). 8 Result 1: MTurkers in treatment did not exhibit behavior consistent with MLA.
Although displaying the sign predicted by MLA theory, a two-sided, unpaired sample t-test indicated that the small di erence between and in terms of the average percentage bet in the lottery (Cohen's d = 0.15) is not statistically signi cant, as can be seen at the top of the corresponding rst pair of bars in  Table A3 for details). Thus, in contrast to our hypothesis, we did not nd evidence that MTurkers exhibit MLA-compliant behavior. The results contradict the ndings of previous studies that have used this experimental design and have found statistically signi cant evidence of MLA-conforming behavior among di erent groups, e.g., university students or nancial professionals (Gneezy and Potters, 1997;Gneezy et al., 2003;Bellemare et al., 2005;Fellner and Sutter, 2009). 4 The self-reported general and nancial risk preferences were based on the German SOEP questionnaire (Dohmen et al., 2011). 5 Recent evidence on the replicability of social science experiments has provided an estimate of the average relative e ect size of true positives that is approximately 71% (Camerer et al., 2018). 6 We performed extensive randomization checks to test whether the distributions of demographic, socioeconomic, and risk-taking characteristics di ered between treatments and sub-treatments. We found no statistically signi cant di erences in participant characteristics between treatments and sub-treatments, indicating a successful randomization procedure (see Table A2 for details). 7 We refer to the Appendix for screenshots of the software. The experimental software can be accessed using the following link. 8 We applied signi cance levels of 5% and 0.5% for all statistical tests in this paper (Benjamin et al., 2017) and took a conservative approach by conducting two-sided tests, which was further justi ed by the empirically con rmed possibility of reverse e ects Weber, 2001, 2005). gure shows the average round bet over nine rounds as a percentage of the initial endowment of 200 tokens for both treatments and sub-treatments (dark gray bars represent sub-treatment and light gray bars represent sub-treatment ). Whiskers denote 95% con dence intervals. p indicates p-values of two-sided unpaired sample t-tests between sub-treatments and . Letters, i.e. a, b, indicate signi cance groupings with respect to overall risk-taking. Conditions with a distinct letter di er statistically signi cantly regarding the average total ( + ) risk-taking (two-sided unpaired samples t test, α = 0.05).
Result 2: We did not nd behavior consistent with MLA in treatment and did not nd evidence of a variation in the di erence in risk taking between and across treatments.
As shown in Figure 1, we found no statistically signi cant di erence in MTurkers' risk-taking between and in treatment , as indicated by the corresponding p-value above the bars obtained from two-sided unpaired sample t-tests (see Table A3 for details). 9 Interestingly, although not statistically signi cant, an inverse pattern compared with that predicted by MLA seemed to occur, i.e, participants in sub-treatment bet more compared to participants in sub-treatment ( : 0.471 -: 0.450 = 0.021; p = 0.46; N = 473).
The absence of a statistical support for the treatment e ects is no su cient evidence for null e ects. As we were highly powered, we performed equivalence tests (TOST) to also test for equivalence with the null hypothesis in both treatments. 10 We followed the approach by Juzek and Kizach (2019) to obtain objective values for the parameter delta (δ) -the minimum worthwhile e ect size -based on our data ( : δ = ± 0.09; : δ = ± 0.09). For these values of δ, equivalence with the null hypothesis regarding the di erence in risk-taking between and could be statistically supported (Tryon and Lewis, 2008) in both treatments ( : p(T > t 1 ) < 0.005, p(T > t 2 ) = 0.0493; : p(T > t 1 ) = 0.007, p(T > t 2 ) < 0.005). Conducting further equivalence tests in treatment , we were able to statistically rule out a di erence in risk taking 9 As robustness checks, we also performed the analyses in both treatments using the non-parametric Mann-Whitney U test, which con rmed the results. 10 We used the user-written program tostt in Stata (Dinno, 2017) 6 between and of more than about 9 percentage points (Cohen's d = 0.30). In the treatment, we were able to statistically rule out a di erence of more than about 6.70 percentage points (Cohen's d = 0.22).
Next, we ran multivariate Tobit regressions with the average lottery bets over nine rounds as dependent variable to examine the robustness of the results and to test for a di erence-in-di erences e ect (see Table   A4 for details). All previous results were con rmed, but we found no variation in di erences in bet amounts between participants in the group and the group across treatments, i.e., no statistically signi cant di erence-in-di erence e ect, as indicated by the coe cient × in models I and II in Finally, we tested for aggregate ( + ) di erences in risk-taking across treatments. The letters at the top of Figure 1 denote signi cance groupings with respect to di erences in aggregate risk-taking. Treatment conditions with distinct letters did di er statistically signi cantly in the mean percentage round bet over nine rounds in two-sided unpaired sample t-tests (α = 0.05). As hypothesized, we found that MTurkers in treatment took more risk than MTurkers in treatment ( : 0.401 -: 0.461 = -0.060; p = 0.002; N = 973, see Table A3 for details).

Conclusion
We conducted a highly powered online experiment with 937 participants on Amazon MTurk to test whether MTurkers exhibit MLA. In doing so, we carefully followed the lottery framework of Gneezy and Potters (1997). With our ndings, we are unable to con rm MLA-compliant behavior for MTurkers in either treatment as we found small, statistically insigni cant di erences in risk-taking between and together with support for the null hypothesis for standardized di erences greater than Cohen's d = 0.30 (0.22) in ( ). In addition, we found no di erence-in-di erence e ect between treatments, indicating no e ect of the varying risk pro les on risk-taking di erences between and . The results survived multiple robustness checks. With these ndings, we join a growing body of scienti c literature suggesting that the relationship between variations in decision and feedback frequency and risk-taking behavior is more complex than has long been assumed. We conclude that the results of previous studies on MLA, or at least the magnitude of the results, are not readily generalizable to other groups of people, which we have shown for MTurkers, a subject pool frequently recruited for online social science experiments (Chandler and Shapiro, 2016). 11 We used the user-written program "ritest" in Stata (Heß, 2017). (0: not at all willing to take risks, 10: very willing to take risks);

A1 Additional gures and tables
I Table A2: Randomization checks across treatments and sub-treatments. The variable indicates the participants' age in years, is a binary dummy taking the value of 0 for female subjects and 1 for male participants. _ is a binary variable, which equals 1 for participants enrolled in economics, business, or business law and 0 for all other study programs. _ is a dummy taking the value of 1 for decision makers who have already worked in the nancial sector or who have speci c nancial education and 0 for participants who have not.
_ represents a binary dummy taking the value of 1 for participants who have invested in nancial products in the last ve years.
is an ordinal variable comprised of the total annual gross income quintiles in the US.
is a 6-item ordinal variable taking the value of 0 for participants with nursery school completed up to a value of 6 for participants with a PhD.    and 0 for participants in treatment . represents a binary dummy variable taking the value 1 for decision makers in the low-frequency feedback sub-treatment and 0 for their peers in the high-frequency feedback sub-treatment, i.e., .
× represents an interaction term between and . The variable indicates the participants' age in years, is a binary dummy taking the value of 0 for female subjects and 1 for male participants. _ is a dummy taking the value of 1 for decision makers who have already worked in the nancial sector or who have speci c nancial education and 0 for participants who have not. _ represents a binary dummy taking the value of 1 for participants who have invested in nancial products in the last ve years. is an ordinal variable comprised of the total annual gross income quintiles in the US.
is a 6-item ordinal variable taking the value of 0 for participants with nursery school completed up to a value of 6 for participants with a PhD. _ is an ordinal variable representing self-reported risk preferences on a 10-point Likert scale in the nancial domain. _ is an ordinal variable representing self-reported risk preferences on a 10-point Likert scale in the general domain. "Permute p" reports the p-values of the corresponding coe cient, obtained from permutation tests with 1,000 random draws.