Identifying social norms using coordination games: Spectators vs. stakeholders

We investigate social norms for dictator game giving using a recently proposed norm-elicitation procedure (Krupka and Weber, 2013). We elicit norms separately from dictator, recipient, and disinterested third party respondents and find that elicited norms are stable and insensitive to the role of the respondent. The results support the use of this procedure as a method for measuring social norms. © 2015 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).


Introduction
A variety of economic phenomena, from punishing non-cooperators to tipping in restaurants, have been explained as norm-driven behavior. 1 Recently economists have moved beyond post-hoc appeals to the explanatory power of social norms, and begun to incorporate the analysis of norms into positive economics by measuring social norms in experiments. For example, recent papers have used a norm-elicitation task introduced by Krupka and Weber (2013, hereafter KW) to study norm-driven behavior in dictator games (KW), gift-exchange games (Gächter et al., 2013), oligopoly pricing games , and to explain the behavior of financial advisers and their supervisors towards their clients (Burks and Krupka, 2012). * Correspondence to: School of Economics, University of Nottingham, University Park, NG7 2RD, United Kingdom. Tel.: +44 0 1158467492. E-mail address: daniele.nosenzo@nottingham.ac.uk (D. Nosenzo). 1 See Elster (1989) for a discussion and interpretation of how norms influence behavior in a variety of settings.
In the KW method for measuring social norms a scenario is described to subjects who then have to judge each action in the scenario as ''very socially inappropriate'', ''somewhat socially inappropriate'', somewhat socially appropriate'', or ''very socially appropriate''. A subject receives a reward if her evaluation agrees with that of other subjects. Thus, subjects have an incentive to reveal what they perceive to be the collectively-shared judgment of appropriateness of the actions they evaluate, and not their own personal judgment. 2 In principle, the norm-elicitation task could be given to either interested parties (''stakeholders'') or, as in most previous applications, to impartial observers (''spectators''). For example, norms about dictator game giving could be elicited from dictators, recipients, or disinterested third parties. Under the assumption that a 2 The material incentives used in the norm-elicitation task generate a coordination game with multiple equilibria. See KW for a discussion of how coordination games can be used to elicit social norms in an incentive-compatible way. See also Xiao and Houser (2005) and Houser and Xiao (2011) who use a related approach to incentivize evaluators to classify natural language messages with commonly shared meanings. stable norm about what constitutes socially appropriate behavior exists, and that subjects use this as a coordination device, any of these sub-groups are incentivized to reveal the underlying norm. If measured norms differ across these sub-groups it suggests either that norms are malleable, or that subjects' responses are revealing something other than the social norm. This in turn would raise questions about the usefulness or validity of KW elicited norms.
One reason norms may be malleable is that stakeholders may manipulate their responses to justify their actions. For example, in dictator games a selfish dictator may distort her judgment of appropriateness to rationalize why she is not giving any money to the recipient. 3 Moreover, stakeholders may use some feature of their experience in the game as a coordination device. For example, in dictator games participants may give responses that reflect what they did or observed others do in the game. More generally, norms may vary depending on the identity and role of the respondent. For instance, norms of distributive justice may vary depending on one's relative income.
In this paper we present an experiment examining the KW norm-elicitation task, focusing on whether measured norms vary according to the role of respondents. In particular we elicit norms about dictator game giving and test whether measured norms differ among stakeholders (dictators and recipients), and spectators (disinterested third parties).

Experimental design
Our experiment is based on the version of the dictator game used by KW. At the outset of the game the dictator is endowed with 10 Euros while the recipient is endowed with 0 Euros. The dictator then decides how much of her endowment to give to the recipient, in increments of 1 Euro. The dictator's allocation decision determines the final payoffs for both players.
The focus of our experiment is on the social appropriateness of the actions available to the dictator in this game. We measure social appropriateness using the norm-elicitation task proposed by KW. In this task subjects read a description of the game and then rate whether each action available to the dictator is ''very socially inappropriate'', ''somewhat socially inappropriate'', ''somewhat socially appropriate'', or ''very socially appropriate''. At the end of the experiment subjects are randomly paired with another participant. One of the dictator's possible actions is then randomly selected, and both subjects receive 10 Euros if their appropriateness ratings for the selected action match, and 0 Euros otherwise.
The experiment is based on two treatments. In our Spectators treatment, as in KW, we collected social appropriateness ratings from subjects who had not previously participated in the dictator game they were asked to evaluate. Thus, raters were ''impartial spectators'' who had no interest at stake in the game. In contrast, at the beginning of a session of the Stakeholders treatment subjects were randomly assigned to the role of either dictator or recipient, and matched in pairs to play a one-shot version of the dictator game described above. After recipients had been informed about the decision of the dictator they were matched with, subjects rated the appropriateness of the actions available to dictators. 4 3 There is some evidence that stakeholders and spectators differ in the extent to which they punish or reward actions that violate or conform to norms of fairness (e.g., Fehr and Fischbacher, 2004;Croson and Konow, 2009). A possible reason for this is that spectators agree and act on norms of fairness to a greater extent than stakeholders, whose self-interest may confound their normative judgments. See Konow (2005) for a review of the literature on stakeholders biases in fairness judgments 4 We informed recipients of the outcome of the game to ensure that both dictators and recipients entered the norm-elicitation task with the same information.
Subjects were paid to coordinate with one other randomly selected subject who had taken the same role as themselves in the game, i.e. dictators coordinated with other dictators, and recipients coordinated with other recipients. The experiment was programmed in z-Tree (Fischbacher, 2007) and was conducted at Maastricht University using 114 students recruited through ORSEE (Greiner, 2004). We conducted 2 sessions of the Spectators treatment (with 38 subjects in total) and 4 sessions of the Stakeholders treatment (with 76 subjects in total). In the Stakeholders treatment subjects were told that the experiment consisted of two parts, but were only given instructions about the norm-elicitation task at the end of the dictator game. Moreover, subjects were only paid for one task (the dictator game or the KW norm-elicitation task), randomly selected at the end of the session. Sessions lasted approximately 40 min and earned 9.81 Euros, including a 5 Euros show-up fee. 5 . 1 shows the mean appropriateness ratings elicited from subjects in the Spectators and Stakeholders treatments. 6 In the latter case, we distinguish between ratings submitted by dictators and recipients. For comparison, the figure also includes the mean ratings reported by KW. Tables 1 and 2 show the full distributions of ratings in our treatments and in KW.

Fig
Starting with our Spectators treatment, we note that the ratings elicited in our experiment are remarkably similar to those observed in KW (see Fig. 1 and Table 1). In both experiments, more than 80% of subjects evaluate the action (10, 0) that maximizes the dictator's payoff as ''very socially inappropriate'', and more than 80% evaluate the action (5, 5) that splits wealth equally between players as ''very socially appropriate''. In both experiments, actions leaving dictators with more than 60% of total wealth are viewed as inappropriate, whereas actions leaving dictators with 60% or less of total wealth are viewed as appropriate, although in both experiments there is less consensus about the appropriateness of actions that leave recipients with more than half of total wealth.
To detect any systematic differences between our data and KW data, we conduct Fisher's randomization tests comparing, for each action, the ratings elicited in our experiment and in KW. 7 Ten of 11 comparisons are statistically insignificant at the 10% level. The exception occurs for the action (6, 4), which our raters evaluated as somewhat more appropriate than KW's raters. However, this result should be interpreted with caution given the inflation of the overall type I error rate due to multiple testing. None of the comparisons are statistically significant if we use a Bonferroni correction to account for multiple testing, and so overall our Spectators treatment successfully replicates the KW norm-elicitation experiment. Table 2 reports the distribution of responses of subjects in the Stakeholders treatment. There are very few differences between the ratings submitted by dictators and recipients, and these ratings are in fact very similar to those collected in the Spectators treatment. As in KW and in our Spectators treatment, both dictators and recipients generally agree that the action (10, 0) is least appropriate and the action (5, 5) is most appropriate. Moreover, for each action, the modal response by either dictators or recipients coincides with that in the Spectators treatment.   Note: responses are ''very socially inappropriate'' (−−), ''somewhat socially inappropriate'' (−), ''somewhat socially appropriate'' (+), ''very socially appropriate'' (++). Modal responses are shaded. We use randomization tests to compare the ratings elicited in the Spectators treatment with those elicited from either dictators or recipients in the Stakeholders treatment. None of the comparisons is statistically significant at the 10% level. Moreover, we do not find any significant difference between ratings submitted by dictators and recipients. These results suggest that the KW normelicitation procedure is robust to potential stakeholder biases. 8

Conclusion
We find that norms of dictator game giving elicited from dictators and recipients are similar to those elicited from disinterested third parties. These results suggest that norms elicited using the KW procedure are not malleable to judgment biases associated with the role of the respondents, nor are they affected by respondents' previous experience with the decision setting they are asked to evaluate. This supports the use of the KW norm-elicitation procedure as a method for measuring social norms.
Our results stand in contrast with the findings reported in a recent study by Rustichini and Villeval (2014). In their experiment subjects report personal judgments about the fairness of actions available to players in dictator, ultimatum and trust games, both before and after playing the games. They find evidence of ''moral hypocrisy'' in the sense that individuals whose actions violate their initial fairness judgments manipulate their later judgments to reconcile these judgments with their actual decisions. These findings point to the vulnerability of personal judgments of fairness to selfserving distortions and manipulations.
While these findings may appear to contradict our results, we note that a crucial difference between Rustichini and Villeval (2014) and us lies in the nature of the elicited norms. Rustichini and Villeval (2014) ask subjects to report their own personal norms, i.e. non-incentivized judgments about what they consider to be appropriate actions in a given decision setting. In contrast, the KW task aims at eliciting subjects' perception of the underlying social norm, i.e. judgments about what they perceive others may consider to be appropriate. Indeed, Burks and Krupka (2012) find systematic differences between social norms elicited using the KW task and non-incentivized personal norms. Taken together these results highlight the importance of distinguishing between personal norms, i.e. private rules or obligations that may be subject to self-serving biases and moral hypocrisy, and social norms as 8 The results are robust to comparing medians rather than means. Median tests indicate forty-two of 44 comparisons are statistically insignificant at the 10% level.
The two exceptions are between our spectators and KW for the actions (4, 6) and (3, 7). None of the comparisons are significant if we use a Bonferroni correction.
collectively-shared understandings of what constitutes socially acceptable behavior.