Does Power Increase Self-Control? Episodic Priming May Not Provide the Answer

Powerful people (e.g., political and business leaders) should be able to control their impulses and act in line with long-term rather than short-term interests. However, theories of power suggest different answers to the question whether the basic experience of feeling powerful decreases (e.g., Keltner, Gruenfeld, & Anderson, 2003) or increases self-control performance (e.g., Magee & Smith, 2013). We conducted a preregistered direct replication of the only experiment testing the effects of power on self-control (Joshi & Fast, 2013, Study 3). In contrast to the original results, social power, operationalized by episodic priming, did not affect temporal discounting. A possible explanation is the fact that the power priming failed to elevate participants’ sense of power. Thus, the null findings challenge the power priming paradigm rather than the two theories from which opposite predictions were derived. In order to understand how power affects self-control, future research may need to rely on other manipulations.

How does feeling powerful prepare individuals for exercising self-control, i.e. to pursue long-term goals? Laypeople seem to agree that powerful people such as organizational or political leaders should be particularly persistent, disciplined, and responsible (Lord, Foti, & Vader, 1984). Two influential theories in power research -the approach/ inhibition theory of power (Keltner et al., 2003) and the social distance theory of power (Magee & Smith, 2013)make opposite predictions with regard to the effects of power on self-control.
Within the framework of the approach/inhibition theory of power, Keltner and colleagues suggest that (1) high power activates the behavioral approach system which is sensitive to rewards and opportunities, and (2) low power activates the behavioral inhibition system which is sensitive to punishment, threat, and uncertainty. Briefly summarized, Keltner and colleagues propose that high power triggers approach-related positive affect, attention to rewards, automatic cognition, and disinhibited behavior, whereas reduced power activates inhibition-related negative affect, systematic cognition, and situationally constrained behavior. Accordingly, due to their heightened attention to rewards and their drive to experience these rewards immediately, powerful people should show relatively poor self-control.
In contrast, the social distance theory of power (Magee & Smith, 2013) assumes that high-power individuals exhibit better self-control than low-power individuals. Magee and Smith propose that asymmetric dependence between two individuals gives rise to asymmetric experiences of social distance, with the high-power individual feeling more subjective distance than the low-power individual. Based on assumptions of construal level theory (Trope & Liberman, 2010), the authors assume that because highpower individuals perceive larger social distance, they engage in more abstract mental representation (i.e., higher level construal) than low-power individuals. Highlevel construals have been shown to have a positive effect on self-control (e.g., Fujita, Trope, Liberman, & Levin-Sagi, 2006;Schmeichel, Vohs, & Duke, 2011). Accordingly, due to their use of high-level construal of goals and situations, powerful people should show good self-control.
In line with the assumptions of the social distance theory of power, Joshi and Fast (2013) showed in three studies that experimentally induced social power benefits the pursuit of long-term goals by reducing the preference for smaller immediate gains over larger future gains (i.e., temporal discounting). When two theories make different predictions and only one is supported, the question arises as to what extent the other theory should be modified or discarded. Given the practical importance of self-control among powerful individuals, research must identify the conditions under which one or the other theory is correct.
However, considering that the only available evidence on the research question comes from a single lab, it seems reasonable to first ask if the effect found by Joshi and Fast (2013) is robust before future research can systematically explore moderators of the effect. This first step is an important one: As all findings result from a combination of signal (an underlying effect) and noise (systematic error in form of moderators as well as unsystematic error in form of measurement error), direct replication is the only way to separate the noise from the signal and average across different types of error (Simons, 2014).
The current work was an attempt to replicate Study 3 from Joshi and Fast (2013). This study was chosen for two reasons. First, it is the only one that had a 3-cell design (high power, low power and control condition) and would therefore show whether the effect is attributable to high or low power (Singh, 1998). Second, it was assumed that preferences for gains in air quality (nonmonetary temporal discounting) would be more comparable across industrialized nations (USA vs. Germany/Switzerland) than preferences for monetary rewards where differences in currency, purchasing power, and inflation might play a role.
In the original web-based study, 78 students experienced a power or control priming, then completed a measure of connection with their future self and finally the nonmonetary temporal discounting task. We conducted a pre-registered direct replication study (osf.io/ um3rq) based on the Replication Recipe (Brandt et al., 2014) using a substantially larger sample. A successful replication would find a significant effect of the power manipulation, in that participants in the high power condition would have lower discount rates than participants in the neutral and low power conditions.

Method
All study materials and procedures can be accessed via osf.io/dqr4m. The present research was done in accordance with the checklist issued by the responsible ethics committee of the Faculty of Philosophy, University of Zurich, meaning that no formal approval was needed. This research respects the Ethical Principles of Psychologists and Code of Conduct by the American Psychological Association (APA) as well as the Ethics Guidelines for Psychologists by the Swiss Psychological Society.

Participants
Sample size was determined based on considerations of statistical power. Simonsohn (2015) noted that the conventional approach of using the effect size estimate of the original study may be problematic. First, publication bias may inflate published effect sizes. Second, a replication may be uninformative when the confidence interval of the replication effect size does not only include zero, but also a detectable effect, that is, an effect size that the original study could have detected with 33% power. According to the recommendations by Simonsohn (2015) at least 2.5 times as many observations as the original study should be collected to have about 80% power to reject the null hypothesis of a detectable effect (i.e., in this case N Original = 78, minimum N Replication = 195). In the present case, a power analysis assuming Joshi and Fast's (2013) sample size, equal per cell sample sizes 1 and statistical power of 33% indicated that the minimum detectable effect was equal to Cohen's d = 0.35. The desired sample size was set to N Replication = 258 because this affords 80% power to reject the null hypothesis that the effect is zero if the effect is detectable (i.e., d = 0.35) and (at least) 80% power to reject the null hypothesis of a detectable effect if the effect is in fact zero.
In total, 636 participants gave their consent to participate in the study (210 participants in the high power condition, 218 participants in the control condition, 208 participants in the low power condition). On the third page of the online questionnaire where participants were meant to experience the power priming, 129 participants in the high power condition, 69 participants in the control condition, and 103 participants in the low power condition dropped out. Comparing all three groups, the dropout rates differed significantly, χ 2 (2) = 38.64, p < .01. Taking only the two power conditions into consideration, the dropout rates still differed significantly, χ 2 (1) = 5.53, p = .02, with 61% vs. 50% dropping out of the study after reading the instructions for the high vs. low power priming. This could be a cause for concern if dropout was systematically related to individual differences. Unfortunately, participants were asked for their demographics (gender and age) only at the very end of the experiment (in line with Joshi & Fast). Accordingly, we were not able to test if gender and condition interact in predicting dropout rates. However, if this was the case we would observe different proportions of men in women in the three conditions, which we do not, χ 2 (2) = 0.82, p = .66 (high power: 36% men, control: 40% men, low power: 43% men). Furthermore, if age was confounded with condition, we would observe differences in the mean age across the three conditions, which we do not either, F(2,278) = 0.04, p = 0.96 (high power: 27.24 years, control: 27.29 years, low power: 27.56 years). Likewise, proportions of students vs. professionals did not differ across conditions, χ 2 (2) = 0.09, p = .96 (high power: 45% students, control: 45% students, low power: 47% students). In light of these results, we believe that systematic dropout does not affect the validity of our manipulation.
Finally, several participants were excluded based on the following a priori exclusion rules: no answers to either the questions referring to the connection to the future self or the temporal discounting questions (31 participants), inconsistent discounting pattern (16 participants), no discounting at all (6 participants), no meaningful description of the situation in which they had/did not have power/ were shopping (19 participants). In summary, 22 participants in the high power condition, 24 participants in the control condition and 26 participants in the low power condition were excluded. The exclusion rates did not differ significantly across condition, χ 2 (2) = 4.768, p = .092. The final sample consisted of 263 participants (M age = 27.21 years, SD = 7.21 years), 98 men, 147 women and 18 participants of unknown gender.

Materials and Procedure
A convenience sample was invited via snowball sampling to take part in an online study on decision behavior. Participants gave their written consent and were then randomly assigned to one of the following three conditions: participants recalled and wrote about a situation when they had power (high power, 59 participants), or when they lacked power (low power, 79 participants) or when they last went to the grocery store (control condition, 125 participants). Following the power manipulation, participants completed a measure of connection with the future self. Participants selected one of seven, increasingly overlapping pairs of circles to indicate how "connected" and how "similar" they felt to their selves in 10 years. These two items were averaged to form a measure of participant's connection with his/her future self with higher values indicating a stronger connection.
The next part of the study consisted in the nonmonetary temporal discounting task. Within this paradigm, a participant makes a number of choices between a larger and several smaller rewards, where the smaller reward is available sooner than the larger one (Green & Myerson, 2004;Smith & Hantula, 2008). Participants were provided with eight binary choices between "improved air quality immediately for 21 days" and "improved air quality one year from now for [number of] days." The number of days in the future was 21, 23, 25, 27, 29, 31, 33, or 35. A single indifference point for each participant was obtained. This is the point at which participants equally value present and future gains. It was calculated by averaging the number of days between which participants switched from preferring the present option to preferring the future option. A high indifference point represents a tendency to prefer a smaller and more immediate reward or a failure to consider long-term potential consequences.
Next, participants completed the Personal Sense of Power Scale (Anderson, John, & Keltner, 2012). Participants stated their agreement to 8 items such as "I think I have a great deal of power" on a scale from 1 (I disagree) to 5 (I agree). Cronbach's alpha was α = .72. All the original materials for the study were available from the authors. All instructions were direct translations.
After the direct replication, we collected additional measures that are not relevant for present purposes. For more information on the procedure and results related to these measures please consult the separate report on the Open Science Framework (osf.io/j67ep). Finally, participants were asked for their demographics.
A few details differed between our study and that of Joshi and Fast (2013) beyond the obvious differences in language and national context. First, we did not counterbalance the order of the connection with the future self scale and the temporal discounting task because the former was described as a mediating variable by Joshi and Fast. Second, our sample did not only include students (110 participants) but also professionals (135 participants, no such information available for 18 participants). Although professionals were older (M student = 24.55, SD = 2.92; M professional = 29.38, SD = 8.77), t(169.02) = 6.01, p < .01, and reported a slightly higher socioeconomic status (M student = 3.12, SD = 0.84; M professional = 3.31, SD = 0.94), t(241.06) = 1.77, p = .08, these two groups were collapsed as the effects in the two groups were similar. Third, in the original experiment participants took part in exchange for course credit. In our replication, participants were invited to take part in a prize draw, in which they could win one of three vouchers worth 30 CHF/30 EUR (corresponding to approximately 33 USD) each.

Manipulation Check
Two coders, who were blind to both conditions and hypotheses, categorized what type of relationship (e.g., manager -subordinate, teacher -student) was described in participants' responses to the writing prompts. They were instructed to reach agreement for this categorization. Table 1 shows that 98.3% of participants in the high power and 92.4% of participants in the low power condition wrote about experiencing power (or the lack thereof) in various relationships. Their responses were rated by the same two coders for how much power the participant reported having using a scale ranging from 1 (none at all) to 5 (very much). The interrater reliability (agreement definition) was good, ICC = .84. We used the mean rating of two raters as dependent variable for the manipulation check. Participants described themselves as having more power in the high-power essays (M = 4.02, SD = 0.58) than in the low-power essays (M = 1.39, SD = 0.66), t(127) = 23.66, p < .01.

Preliminary Analyses
A one-factorial ANOVA on the personal sense of power revealed no difference across conditions, F(2,259) = 0.28, p = .75, η p 2 = 0.002, 95% CI [0, 0.02]. Furthermore, there were no significant differences between conditions in temporal discounting, F(2,260) = 0.27, p = .76, η p 2 = 0.002, 95% CI [0, 0.02], or the felt connection with the future self, F (2,260) = 0.20, p = .82, η p 2 = 0.002, 95% CI [0, 0.02] (for descriptives see Table 2). Table 3 shows the intercorrelations of all measured variables. Only one correlation was significant: Personal sense of power was positively related to a stronger connection with the future self. Although only correlational evidence, this supports Joshi and Fast's reasoning that high power should increase the connection with the future self. Joshi and Fast (2013) reported a significant contrast between participants in the high-power condition and those in the low-power and baseline conditions combined. The same analysis performed on the present data indicated a non-significant difference in the same direction as the effect reported by Joshi and Fast (2013), t(261) = 0.71, p = .48, d = 0.09, 95% CI [-0.15, 0.33]. Thus, the confidence interval indicates that the effect is consistent with the null hypothesis of no effect but inconsistent with the null hypothesis of a detectable effect (which was determined to be d = .35 see above). As for the presumed mediator of the effect observed by Joshi and Fast (2013), high-power participants did not score higher on the measure of connection with the future self than did participants in the other two conditions, t(261) = 0.33, p = .74, d = 0.04, 95% CI [-0.20, 0.28].

Discussion
Replication constitutes an important contribution to cumulative science because it allows for testing the robustness of results and hence provides researchers with greater confidence about the existence and direction of effects (Brandt et al., 2014;Klein et al., 2014). Given the importance of understanding the relationship between social power and self-control, we sought to replicate Joshi and Fast's (2013, Study 3) finding that power increases self-control (d = .53). Using a much larger sample size but nearly identical procedures as Joshi and Fast we obtained a much smaller effect (d = 0.09, 95% CI [-0.15, 0.33]). This non-significant effect is informative because it is significantly smaller than an effect that would give the original study a statistical power of 33% (i.e., the minimum detectable effect; Simonsohn, 2015) and that the current research had 80% statistical power to detect (i.e., d = .35). One straightforward implication of this result is that future replication studies relying on the procedures used by Joshi and Fast (2013) should be prepared to collect even larger samples to achieve adequate statistical power.
Ultimately, though, researchers are likely to be more interested in the validity of the theoretical claims about the effects of social power than in the reproducibility of one particular study using a specific methodology. On the face of it, the null finding regarding the relation of power and discounting would seem to disconfirm the predictions derived from both the social distance theory of power (predicting a negative effect) and the approach/ inhibition theory of power (predicting a positive effect). However, every empirical study involves auxiliary assumptions regarding the operations and measures used to test a theory, and those may be wrong as well. Most prominently, the validity of the manipulation and measures might be questioned. Temporal discounting as operationalization of selfcontrol is a well validated and common paradigm in psychology and economics. It has successfully been used in pathological (e.g., substance addicts, MacKillop, Amlung, Few, Ray, Sweet, & Munafò, 2011; pathological gamblers, Dixon, Marley, & Jacobs, 2003) as well as in normal populations, in student as well as non-student populations (e.g., Buono, Whiting, & Sprong, 2015), in the USA as well as in Germany and Switzerland (e.g., Gianotti, Figner, Ebstein, & Knoch, 2012;Peters, & Büchel, 2009).
Admittedly, the air quality delay discounting task that was employed in the original study differed in two aspects from more widely used versions of this task: the number

Measure
High power condition (n = 59) Control Condition (n = 125) Low power condition (n = 79)   of delays and the discounted reward. Although it is more common to administer temporal discounting tasks with several delays, the one-shot discounting task chosen by Joshi and Fast should yield comparable results to a more comprehensive version of this task. Reimers, Maylor, Stewart, and Chater (2009) compared a one-shot discounting task with a discounting task with several delays and obtained comparable results. In a similar vein, Yi, Pitcock, Landes, and Bickel (2010) found that valid and sensitive discounting indices can be obtained with fewer indifference points than the standard number of 5 or 7. Regarding the nature of the discounted reward, an improvement in air quality for 21 days may seem relatively intangible (and maybe irrelevant) in comparison to receiving (hypothetical) monetary rewards for oneself. However, previous studies showed that temporal discounting occurs when environmental rewards are in question (e.g., air and water quality, Guyse, Keller, & Eppel, 2002;nuclear and hazardous wastes, Moser, Stauffacher, Smieszek, Seidl, Krütli, & Scholz, 2013; improvements in green space and stormwater control as well as reducing greenhouse gasses, Richards & Green, 2015). Furthermore, we did not find an effect of social power on monetary temporal discounting either 2 (see supplementary material available at osf.io/ j67ep).
Regarding the manipulation of power, it must be noted that the episodic power priming is also a widespread paradigm (Galinsky, Rucker, & Magee, 2015) that has already been successfully used in German and Swiss samples (e.g., Schmid & Schmid Mast, 2013;Scholl & Sassenberg, 2014). In fact, noting that sometimes one and the same article reports similar findings across studies with priming or role-based manipulations, power researchers have argued for the superiority of priming manipulations because it "remove[s] issues of conscious awareness or intent" (Smith & Trope, 2006, p. 580) or because "it can be difficult to manipulate power in an ethical, believable, and effective way in the laboratory" (Smith & Galinsky, 2010, p. 928).
However, we think it is possible that the simplicity of this paradigm is offset by its inability to reliably produce a sense of power that would affect participant's decisionmaking -at least when administered online. Although our manipulation check indicated that participants followed instructions and provided examples of situations that differed in experienced power, a closer look at the properties of the texts produced by participants reveals that the intensity of the manipulation may have been rather low. Participants in this online study wrote on average 264 characters (SD = 246, range from 13 to 2352 characters) and took about Mdn = 197 seconds to do so. Assuming that the average reading speed (German, aloud) is 11.5 characters per second (SD = 5.5, Trauzettel-Klosinski & Dietz, 2012) and the average typing speed is 2.82 characters per second (Soukoreff & Mackenzie, 1995), participants would have needed about 118 seconds on average for reading and writing and accordingly would have had 79 seconds left to find a suitable situation and put themselves in this situation. This might not have been enough time to really experience the imagined situation. In fact, the differences in sense of power that we observed across conditions were non-significant and negligible in terms of effect size. Although this may in part be due to the fact that we did not modify the original items so that they would explicitly refer to the current situation (as opposed to the dispositional sense of power), it is noteworthy that a standard power manipulation did not leave its mark on a reliable measure of felt power.
In order to enhance the effectiveness of the episodic priming manipulation, a reviewer suggested freezing the survey on the manipulation page and asking participants to visualize the respective situation (e.g., by adding the prompt to picture the faces of the other people involved, to imagine talking with the person and to try to feel the other people there with them). We agree that these additional instructions may serve to intensify the priming manipulation. Another explanation for our inability to replicate the effect of power on self-control could be a weak explicit concept association between social power and self-control. Given that priming occurs as a result of spreading activation of related concepts in memory, larger effects should be found for strongly associated concept pairs (Salomon, 2016). We have reason to believe that the concepts of power and self-control might be only weakly associated (data from an unrelated pilot study).
In summary, as the power manipulation used here did not affect felt power, it seems fair to begin by questioning the superiority of the episodic priming paradigm rather than by concluding that there is no effect of social power on self-control. Perhaps the claim that "[a]ll the ways of manipulating power seem to have similar effects" (Smith & Galinsky, 2010, p. 928) should be evaluated more systematically, either through metaanalysis or via pre-registered comparisons. The alternative hypothesis would be that some of the effects of power require more intense feelings of power and/or conscious awareness of being in a powerful or powerless situation. Future research using different manipulations and operationalizations of both constructs is needed to clarify this effect and help adjudicate between the opposite predictions regarding the direction of the effect. The present research suggests that episodic priming may not be an ideal vehicle for this effort.