Searching for the Backfire Effect: Measurement and Design Considerations

One of the most concerning notions for science communicators, fact-checkers, and advocates of truth, is the backfire effect; this is when a correction leads to an individual increasing their belief in the very misconception the correction is aiming to rectify. There is currently a debate in the literature as to whether backfire effects exist at all, as recent studies have failed to find the phenomenon, even under theoretically favorable conditions. In this review, we summarize the current state of the worldview and familiarity backfire effect literatures. We subsequently examine barriers to measuring the backfire phenomenon, discuss approaches to improving measurement and design, and conclude with recommendations for fact-checkers. We suggest that backfire effects are not a robust empirical phenomenon, and more reliable measures, powerful designs, and stronger links between experimental design and theory could greatly help move the field ahead.

Furthermore, research has failed to show backfire effects systematically in the same subgroup, so practitioners should not avoid giving corrections to any specific subgroup of people. Finally, avoiding the repetition of the original misconception within the correction appears to be unnecessary and could even hinder corrective efforts. However, misinformation should always be clearly and saliently paired with the corrective element, and needless repetitions of the misconceptions should still be avoided.
Keywords: Backfire effects, Belief updating, Misinformation, Continued influence effect, Reliability One of the most concerning notions for science communicators, fact-checkers, and advocates of truth is the backfire effect. A backfire effect occurs when an evidence-based correction is presented to an individual and they report believing even more in the very misconception the correction is aiming to rectify (Lewandowsky, Ecker, Seifert, Schwarz, & Cook, 2012). This phenomenon has extremely important practical applications for fact-checking, social media, and all corrective communication efforts. However, there is currently a debate in the literature as to whether backfire effects exist at all, as recent studies have failed to find them, even under theoretically favorable conditions (e.g., . In this article, we discuss the current state of the worldview and familiarity backfire effect literatures, examine barriers to measuring the correction of misinformation, and conclude with recommendations for fact-checkers and communicators.

Definitions
There are numerous barriers to changing inaccurate beliefs after corrections have been presented. The continued influence effect is where individuals still use inaccurate information in their reasoning and memory after a credible correction has been presented (Johnson & Seifert, 1994;Lewandowsky et al., 2012). There is also belief regression, where individuals initially update their belief after being exposed to the correction, but this belief change is not sustained over time (Berinsky, 2017;Kowalski & Taylor, 2017;. In contrast to the backfire effect, these barriers are where people at least still update their beliefs in the intended direction promoted by the correction. The term backfire effect only pertains to cases where a correction inadvertently increases misinformation belief relative to a precorrection or no-correction baseline. It has also been referred to as the boomerang effect (Hart & Nisbet, 2012) or backlash (Guess & Coppock, 2018).
Two backfire effects have gained popularity in the literature: the worldview backfire effect and the familiarity backfire effect. These both result in increased belief after a correction yet are thought to have different psychological mechanisms. The worldview backfire effect is said to occur when people are motivated to defend their worldview because a correction challenges their belief system . It is more likely to occur with items that are important to the individual, such as politicized "hot-button" issues or information that the individual believes in strongly (Flynn, Nyhan, & Reifler, 2017;Lewandowsky et al., 2012). In contrast to the mechanisms of the worldview backfire effect, the familiarity backfire effect is presumed to occur when misinformation is repeated within the correction (Schwarz, Sanna, Skurnik, & Yoon, 2007). 1 For example, if one were to try to correct a misconception and stated that "eating apricot seeds does NOT cure cancer," the correction repeats both "apricot seeds" and "curing cancer," thus making the original misinformation more familiar. This increased familiarity is problematic because people are more likely to assume that familiar information is true-a phenomenon called the illusory truth effect (Begg, Anas, & Farinacci, 1992). In other words, this boost in familiarity when correcting misinformation is thought to be sufficient to increase the acceptance of the misinformation as true, even though it is paired with a retraction.

Worldview Backfire Effect
The logic behind the worldview backfire effect stems from the motivated reasoning literature, where one's ideology and values influence how information is processed (Kunda, 1990;Wells, Reedy, Gastil, & Lee, 2009), and information that counters preexisting beliefs is evaluated more critically than belief-congruent information (Taber & Lodge, 2006). A possible reason for the backfire effect is that people generate counter-arguments consistent with their pre-existing views to contradict the new information or correction (Nyhan & Reifler, 2010).
The landmark paper regarding the worldview backfire effect is Nyhan and Reifler (2010). Their first experiment corrected the misconception that weapons of mass destruction were found in Iraq during the 2003 invasion. Liberal individuals, whose worldview aligned with the correction, were able to successfully update their belief, whereas conservatives increased their belief in the misconception. Although Nyhan and Reifler's second experiment failed to replicate the backfire effect for this item with the conservative group as a whole, they did find the phenomenon in a subset of conservative respondents who rated Iraq as the most important problem facing the country at that point in time. The authors suggested that backfire effects may only occur when a belief is strong, and the issue is currently connected with an individual's political identity. This suggestion aligns well with subsequent research demonstrating that worldview backfire effects have almost exclusively been found in either political or attitudinal subgroups, rather than communi-1 There are several studies that suggest that people misremember false information to be true more often than they misremember true information to be false (Peter & Koch, 2016;Skurnik, Yoon, Park, & Schwarz, 2005). Although this asymmetry could indeed stem from a familiarity process (see , this does not meet the criteria of a backfire effect. See Appendix A for details regarding articles that are frequently cited in support of backfire effects but do not meet backfire criteria. ties as a whole. One major problem is that beyond the scientific literature, the media and online science blogs have often overgeneralized backfire effects found in subgroups to the population as a whole and to all corrective information (e.g., Science, 2017).
There have subsequently been worldview backfire effects reported in a variety of subgroups with misinformation regarding vaccines (in respondents with least favorable vaccine attitudes, Nyhan, Reifler, Richey, & Freed, 2014; in respondents with high levels of concern about vaccine side effects, Nyhan & Reifler, 2015), climate change (in Republican participants, Hart & Nisbet, 2012; in Republicans with high political interest, Zhou, 2016), the existence of death panels (in politically knowledgeable Palin supporters, Nyhan, Reifler, & Ubel, 2013), and with a fictitious scenario detailing that right-wing politicians generally misappropriate public funds more than left-wing politicians (in right-wing attentive participants, Ecker & Ang, 2019), see Appendix B. In addition to observing backfire effects in widely varying subgroups, a further complication is that the dependent variable has also varied substantially between studies. These dependent variables roughly fall into three categories: belief in or agreement with a claim (e.g., Nyhan & Reifler, 2010), behavioral intentions (e.g., Nyhan and Reifler, 2015), or use of misinformation when answering inference questions (e.g., Ecker & Ang, 2019).
Regardless of the dependent variable used, failures to find or replicate previously observed backfire effects have been widespread (e.g., Garrett, Nisbet, & Lynch, 2013;Nyhan, Porter, Reifler, & Wood, 2019;Schmid & Betsch, 2019;Swire, Berinsky, Lewandowsky, & Ecker, 2017;Swire-Thompson, Ecker, Lewandowsky, & Berinsky, 2019;Weeks, 2015;Weeks & Garrett, 2014), even when using identical items that previously elicited the phenomenon. For example, Haglin (2017) used identical methods and vaccine-related items to those from Nyhan and Reifler (2015) and failed to find any evidence of a backfire effect. The largest failure to replicate to-date was by Wood and Porter (2019), conducting five experiments with over 10,000 participants. The items were specifically chosen to be important ideological issues that would be theoretically conducive to a worldview backfire effect. The authors found that out of 52 issues corrected, no items triggered a backfire effect. Much of the literature has interpreted these failures to replicate to indicate that either (a) the backfire effect is difficult to elicit on the larger group level, (b) it is extremely item-, situation-, or individual-specific, or (c) the phenomenon does not exist at all. See Appendix B for details regarding which studies found a worldview backfire effect, which did not, and the dependent variable(s) used in each.

Familiarity Backfire Effect
In contrast to the ideological mechanisms behind the worldview backfire effect, familiarity backfire effects are often presumed to occur due to the correction increasing the misinformation's processing fluency. In other words, the correction of "apricot seeds do NOT cure cancer" increases the ease in which "apricot seeds" and "cancer" are retrieved and processed (Schwarz, Newman, & Leach, 2016). However, the specific mechanisms of how repetition could lead to an increase in perceived truth are currently under debate (see Unkelbach, Koch, Silva, & Garcia-Marques, 2019, for a review). Furthermore, the familiarity backfire effect has often been conflated with the more well-established illusory truth effect. The former refers to increasing belief due to information repetition within a correction and has little to no empirical support, whereas the latter refers to increasing belief due to information repetition in the absence of a correction and is a robust empirical phenomenon (Fazio, Brashier, Payne, & Marsh, 2015).
The original notion of the familiarity backfire effect stems from an unpublished manuscript , as cited in Schwarz et al., 2007) where participants who viewed a flyer with "myth vs. fact" information regarding the flu vaccine reported less favorable attitudes toward vaccination than those who did not view the flyer. Although this paper is highly cited (e.g., Berinsky, 2017;Cook, Bedford, & Mandia, 2014;Gemberling & Cramer, 2014;Lilienfeld, Marshall, Todd, & Shane, 2014;Peter & Koch, 2016;Pluviano, Watt, Ragazzini, & Della Sala, 2019;, it is difficult to evaluate given that it remains unpublished. There have been failures to directly replicate this study (Cameron et al., 2013), and the phenomenon has not been elicited under theoretically conducive circumstances, including a three-week delay between corrections being presented and belief being measured . Furthermore, since worldview backfire effects have been demonstrated using vaccine stimuli (e.g., Nyhan et al., 2014), it is unclear whether the Skurnik et al. (2007) backfire effect was due to worldview or familiarity mechanisms. This potential misattribution also applies to Pluviano, Watt, and Della Sala (2017), Pluviano et al. (2019), and Berinsky (2017), where the backfire effects were reportedly due to familiarity mechanisms yet could have been due to worldview since the experiments exclusively used politicized information. See Appendix C for details regarding which studies found a familiarity backfire effect, which did not, and the dependent variable(s) used in each study.
There have also been recent findings that do not align with the familiarity backfire notion. For instance, simply tagging misinformation as false-with no further explanation as to why it is false-has shown to substantially reduce belief, both relative to a pre-correction within-subject baseline and in comparison to a control group who did not receive a correction at all (Ecker, O'Reilly, Reid, & Chang, 2019). Furthermore, if the familiarity backfire effect were genuine, then a practical recommendation would be to avoid repeating the misconception when presenting the correction. However, a recent meta-analysis of 10 studies found that there was no significant difference in belief updating when comparing whether or not the initial misconception was repeated within the correction (Walter & Tukachinsky, 2019). 2 Several recent studies not included in this meta-analysis found that repeating the misconception immediately prior to the correction facilitated belief updating (Carnahan & Garrett, 2019;Ecker, Hogan, & Lewandowsky, 2017), and that explicitly repeating misinformation prior to the correction is more effective than only implying it (Rich & Zaragoza, 2016). Although these findings collectively oppose the familiarity backfire notion, they align well with theoretical accounts that the co-activation of the misconception and corrective information facilitates knowledge revision (Kendeou & O'Brien, 2014). It is possible that pairing the misconception and correction increases the likelihood that people notice discrepancies between the two, facilitating the integration of new information into their existing mental model (Elsey & Kindt, 2017;Kendeou, Butterfuss, Van Boekel, & O'Brien, 2017).
Finally, the illusory truth effect and familiarity backfire effect are thought to rely on the same mechanisms, and evidence suggests that the illusory truth effect can be eliminated when veracity is made salient to participants. Brashier, Eliseev, and Marsh (2020) found that when participants were simply asked to rate statements for accuracy, the illusory truth effect was wiped out. In other words, if participants knew that the item was false, the illusory truth effect was not elicited if participants were instructed to focus on accuracy both immediately and after a two-day period (also see Rapp, Hinze, Kohlhepp, & Ryskin, 2014). In sum, although repeated exposure to misinformation alone certainly increases belief, the weight of evidence suggests that this rarely, if ever, occurs when the misinformation is paired with a clear and salient correction. It remains theoretically possible that there are circumstances where the familiarity boost of the misconception outweighs the corrective element (for example, when attention is divided, Troyer & Craik, 2000), but this has not been observed empirically.
Future research can more specifically investigate how familiarity boosts that increase belief and corrections that decrease belief interact. For instance, Pennycook, Cannon, and Rand (2018) found that the increase in belief due to a single prior exposure of fake news was approximately equivalent to the reduction of belief when the fake news was accompanied by a "disputed by third-party fact-checkers" tag.

Measurement and Design Considerations
The above review suggests that backfire effects are not a robust empirical phenomenon and it could be the case that they represent an artifact of measurement error. Misinformation is still a relatively new field and more reliable measures and more powerful designs are needed to move the field ahead and determine the fate of backfire effects. Here we suggest some experimental and theoretical steps that could improve the quality of the evidence. In particular, we suggest that future studies should carefully consider measurement reliability, when possible use more powerful designs with greater internal validity, be aware of sampling and subgroup issues, and take care in linking and Chan (2011, study 1), Nyhan and Reifler (2015), Cobb, Nyhan and Reifler, (2013, study 1), Huang (2017, study 1 and 2), Thorson (2013, study 3; measures with particular theories. The recommendations below could also be applicable to misinformation studies in general, rather than studies that specifically examine backfire effects.

Reliability
Reliability is defined as the consistency of a measure, that is, the degree to which a test or other measurement instrument is free of random error, yielding the same results across multiple applications to the same sample (VandenBos, 2007). Although other areas of psychology have been highly focused on measuring the reliability of their assessments (e.g., individual differences, neuropsychology, attitude research 3 ), this has largely not been the case with misinformation science. A common methodological weakness in this area is the reliance on a single item to measure a belief or agreement. Single items are noisy and often have poor reliability (Jacoby, 1978;Peter, 1979), and under these conditions statistical significance may convey very little information (Loken & Gelman, 2017). Given that 81% of backfire effects found in our review of the worldview and familiarity literatures are found with single item measures, we must consider that poor item reliability could be contributing to this phenomenon. See Appendices B and C for details regarding the number of items in the measures of each study. Indeed, we found that the proportion of backfire effects observed with single item measures (37%) was significantly greater than those found in multiple item measures (8%; Z = 2.96, p = .003).
Quantifying item-level reliabilities could greatly aid in interpretation, given that a backfire effect observed on a more reliable item would have greater meaning than if found on an unreliable item. Perhaps the simplest approach to measure reliability for a single item is to include a test-retest group where participants rate their beliefs and then re-rate them after an interval has passed. This approach can be done in a control group or during a pre-baseline period in a waitlist design, if effects are expected to be extremely sample-specific. Although multi-item measures are typically more reliable than single item measures, there are occasions where single items can be sufficiently reliable (Sarstedt & Wilczynski, 2009). It is typically recommended that single-item test-retest reliability should be ≥.70 (Nunnally, 1978;Wanous & Hudy, 2001). Unfortunately, because so few studies in the field of misinformation have reported any measure 3 Multi-item scales have long been popular in measuring attitudes (Edwards, 1983;Likert, 1974). The difference between "attitudes" and "belief" is often difficult to discern, but previous work has roughly defined attitudes as affective and beliefs as cognitive (Fishbein & Raven, 1962). We should be able to take inspiration from such attitude scales and develop items to measure how people consider the veracity of an item. For example, the belief that "listening to Mozart will make an infant more intelligent', could also be measured by asking participants whether they believe that "classical music has a unique effect on the developing prefrontal cortex". Inspiration can also be taken from studies that use "inference questions", where participants are required to use their belief in judgment tasks. For example, "If one twin listened to Mozart every night for the first 10 years of their life, and another twin was not exposed to classical music at all, how likely is it that they will have a different IQ?" or "Listening to Mozart every evening for 3 years will increase a child's IQ by what percent?" . of reliability, it is hard to know which, if any, of the items have sufficient reliability to adequately measure backfire effects.
Implementing multi-item measures could both produce more reliable measures and inspire confidence that we are measuring generalizable constructs (e.g., whether items are perceived to be important or "hot-button" issues) rather than item-specific effects. One noteworthy study by Horne, Powell, Hummel, and Holyoak (2015) incorporated a 5-item scale (reliability: α = .84) to measure vaccine attitude changes, which correlated well with whether parents have ever refused a vaccination (r = −0.45). Notably, these data were subsequently reanalyzed by another group and interpreted at the individual item level because they thought the items represented separate constructs (Betsch, Korn, & Holtmann, 2015). This example not only shows that a multiitem measure can be highly reliable, but also demonstrates the challenges of creating a widely accepted multi-item measure and the current bias in the field toward analyzing individual items. In light of the replication crisis in psychology and beyond (Open Science Collaboration, 2015), all fields in the behavioral sciences have begun to focus more on measurement reliability, and a greater consideration of this issue in misinformation research could substantially improve the interpretation and replicability of our findings, particularly with regards to backfire effects.
A related issue in measuring the reliability of beliefs is that some beliefs may be stronger and more well-formulated than others. Less formulated beliefs may themselves be highly variable independent of measurement error. One approach to addressing this is to use several items to measure participants' within-belief consistency (e.g., higher consistency would indicate more formulated beliefs) as well as explicitly asking participants to rate how well-formed they perceive their beliefs to be.
A final measurement issue that could influence backfire effects is that unreliable measures have more random error and are more susceptible to regression to the mean, where extreme values at baseline testing become less extreme at follow-up testing (Bland, 2017). A regression-to-the-mean effect may be particularly problematic for individuals or subgroups in pre-post design studies who report low pre-correction belief, given that the effect could increase post-correction belief. Thus, this phenomenon could potentially result in spurious backfire effects. In Figure 1 we plot simulated data to illustrate this point. Panel A shows test-retest data for an item with poor reliability (Pearson's r = .40) whereas Panel B shows test-retest data for an item with good reliability (Pearson's r = .85). Note that at retest, data points at the extremes in the unreliable measure move more toward the mean from the line of equality (line where Time 1 = Time 2) compared to the reliable measure. Panels C and D shift these data points down 2.5 points as if a correction has been elicited. The gray area represents the "backfire zone" where post-correction belief is higher than pre-correction belief. Participants with low pre-correction belief are more likely to be found in the backfire zone when the item is unreliable (Panel C) than when it is reliable (Panel D). Though this is an oversimplified example, it shows how poor reliability and regression to the mean can give rise to spurious backfire effects in individuals and subgroups. It should be noted that effects of regression to the mean can be substantially mitigated by limiting exploratory subgroup analyses as well as including a well-matched test-retest or placebo group for comparison.

Experimental Design
In terms of design, studies of the backfire effect have varied widely, with most examining between-groups differences of correction versus no correction groups (using 5-point scales, Garrett et al., 2013;Nyhan & Reifler, 2010;7-point scales, Wood & Porter, 2019;Zhou, 2016; percentages of participants accepting, rejecting, or unsure about the misinformation, Berinsky, 2017; or counting the mean number of references to corrected misinformation after reading a fictitious scenario, Ecker & Ang, 2019). In these studies, participants are typically randomly assigned to treatment or control, and participants' beliefs are only measured at one time point, with the experimental group being assessed after the correction. In addition to these post-only with control studies, a handful of studies have used within-subject pre versus post correction differences (using 11-point belief scales, Aird, Ecker, Swire, Berinsky, & Lewandowsky, 2018;Swire-Thompson et al., 2019), though nearly all have lacked test-retest control groups (for an exception, see Horne et al., 2015). Other studies have used idiosyncratic approaches such as performing qualitative interviews (Prasad et al., 2009). 4 Post-test only with control designs have an advantage in that they may be more practically feasible, often only requiring a single testing session. Another advantage of this design is that researchers are able to test belief without a further familiarity boost, which is potentially important for studies attempting to examine the familiarity backfire effect. Post-test only with control designs are also thought to limit carryover effects associated with pre-post designs, although it is questionable whether carryover effects are a concern in misinformation studies. If carryover effects were problematic, participants in pre-post studies would provide post-correction responses that are similar to their initial response, and the belief change observed in these designs would be significantly smaller than post-test only with control designs. However, effect sizes of belief change in pre-post studies are similar in magnitude to post-test only with control designs, suggesting that the impact of carryover effects is likely minimal. In fact,  found larger decreases in belief in false claims if the manipulation was within-subjects rather than between subjects. Furthermore, effect sizes for belief change in pre versus post-correction studies are typically large, especially in conditions where there is no delay between pre and posttest, where one would expect carryover effects to be most pronounced 4 Prasad developed a technique called "challenge interviews" where interviewers presented participants with substantive challenges to their political opinions. They themselves do not claim their findings were a backfire effect in their paper but are frequently cited in support of the worldview backfire phenomenon. They found that the most popular strategy was "attitude bolstering", bringing facts that support one's position to mind without directly refuting the contradictory information. Belief change was not measured.

Figure 1.
Simulated data of 30 participants on a misinformation item. Panel A illustrates test-retest data for an item with poor reliability (r = .40) and Panel B for an item with acceptable reliability (r = .85). The dotted lines in Panel A and B represent the line of equality, on which all data points would lie if the measurements at Time 1 and Time 2 were identical. Note greater regression-to-the-mean effects in the unreliable data than the reliable data, indicated by the arrows. Panels C and D shift these data points down 2.5 points as if a correction has been elicited. The gray area represents the "backfire zone." Panel C shows pre/post data demonstrating that subjects reporting low pre-correction beliefs may be more likely to result in spurious backfire effects if the measures are unreliable. An important issue with designs such as the post-test only with control, is that even with randomization the groups may not be adequately matched at baseline (Morris, 2008). This issue may be particularly problematic with smaller sample sizes. The prevalence of this problem is hard to assess because many studies fail to report important demographic and political characteristics and rarely compare these variables between experimental and control groups. It is easy to imagine small group differences in demographics and political polarization producing the appearance of backfire effects. Pre-post designs are a viable alternative to post-test only designs and are not as affected by sampling issues. However, pre-post designs without a control group (e.g.,  could also potentially suffer from problems related to repeated testing effects such as regression to the mean (Vickers & Altman, 2001), which could drive backfire effects, particularly at the subgroup level (see Figure 1).
A more powerful design that can overcome many of these issues is a pre-post control group design, where participants are randomly assigned to intervention or control conditions, and participants are measured both before and after the intervention or control is administered (Schulz, Altman, & Moher, 2010). This design is common in clinical intervention studies, and compared to post-test only between-subject designs, it offers a boost in statistical power. Further, because the experimental manipulation is within-subjects, the internal validity of this design does not solely depend on random assignment (Charness, Gneezy, & Kuhn, 2012). One important question for this design with regards to the backfire effect is what the best control condition would be. Though a test-retest control is a starting point, a more sophisticated approach would be to employ one or multiple placebo conditions (e.g., where participants are given information related to the misinformation that does not correct nor confirm it). The only study that we are aware of that has used this design in the context of backfire effects is Horne et al. (2015). They compared vaccine attitudes in 315 adults before and after random assignment to either (a) autism correction, (b) disease risk information, or (c) reading an unrelated scientific vignette (control condition). Notably, they did not find backfire effects at the group level. When exploring subgroups, those with the least favorable attitudes toward vaccines were the most receptive to change. However, those with the most favorable attitudes to vaccines did show a backfire, which the authors interpreted as regression to the mean. Though this type of design may be more participantand resource-demanding than post-only with control or pre-post designs, it could help provide a more powerful evaluation of the possible presence of backfire effects.
We finally turn to demand characteristics and whether participants' expectations for how they are meant to behave facilitate backfire effects (Orne, 1959). Demand characteristics generally lead participants to be a "good subject," encouraging them to behave in a manner that confirms the hypothesis of the experimenter (Nichols & Maner, 2008). If the participant does receive cues that the experiment is about the efficacy of belief updating, they are likely to further reduce their belief after viewing a correction, rather than report increasing their belief. The only study, to our knowledge, that explicitly asked subjects about the purpose of the experiment in a debriefing questionnaire was conducted by Ecker, Lewandowsky, and Tang (2010), and they found that virtually all participants did indeed correctly assume that the experiment was about memory updating. It is nonetheless important that future studies quantify the extent to which demand characteristics influence misinformation experiments in general. Should future investigations deem them problematic, one method to reduce demand characteristics is to blind participants to the goals of the study and, for in-lab studies, blind experimenters (see Orne, 2009).

Sampling and Subgroup Issues
Another step forward for backfire studies is to be more aware of sampling and subgroup issues because the subgroups in which backfire effects have been found vary substantially. As we previously noted, the internal validity of between-groups post-test only designs can be seriously undercut by demographic differences between the groups, and more thorough between-groups demographic comparisons in these studies is essential. Further, though some studies that test for backfire effects have used previously defined subgroups (e.g., Ecker & Ang, 2019;Haglin, 2017;, some of the subgroup analyses reported may have been post hoc. Post hoc subgroup analyses have been harshly criticized in the clinical literature (Wang, Lagakos, Ware, Hunter, & Drazen, 2007) because it is often unclear how many are performed and whether they are motivated by inspection of the data. Thus, in future studies, subgroup analyses derived from data inspection should be explicitly identified as exploratory (as done by Nyhan & Reifler, 2010), and subgroup analyses should be pre-specified, or better yet, pre-registered.

Practical Recommendations
Regarding the worldview backfire effect, fact-checkers can rest assured that it is extremely unlikely that, at the broader group level, their fact-checks will lead to increased belief in the misinformation. Meta-analyses have clearly shown that corrections are generally effective and backfire effects are not the norm (e.g., Chan, Jones, Hall Jamieson, & Albarracín, 2017;Walter & Murphy, 2018). Furthermore, given that research has yet to systematically show backfire effects in the same subgroups, practitioners should not avoid giving corrections to any specific subgroups of people. Fact-checkers can therefore focus on other known issues such as getting the fact-checks to the individuals who are most likely to be misinformed.
Regarding the familiarity backfire effect, avoiding the repetition of the original misconception within the correction appears to be unnecessary and could even hinder corrective efforts Kendeou & O'Brien, 2014). We therefore instead suggest designing the correction first and foremost with clarity and ease of interpretation in mind. Although the familiarity backfire effect lacks evidence, we must be aware that the illusory truth effect in the absence of corrections or veracity judgments is extremely robust. Therefore, when designing a correction, the misinformation should always be clearly and saliently paired with the corrective element, and needless repetitions of the misconception should still be avoided. For instance, given that many individuals do not read further than headlines (Gabielkov, Ramachandran, Chaintreau, & Legout, 2016), the misconception should not be described in the headline alone with the correction in smaller print in the text below (Ecker, Lewandowsky, Chang, & Pillai, 2014;. Adding the corrective element within the headline itself, even if it is simply a salient "myth" tag associated with the misconception, can be considered good practice.

Future Research
Although improvements in both experimental measures and designs are important, Oberauer and Lewandowsky (2019) highlight that another cause of poor replicability is weak logical links between theories and empirical tests. Future research could more explicitly manipulate key factors presumed to influence belief updating, whether it be fluency, perceived item importance, strength of belief, complexity of the item wording, order of corrective elements, internal counter-arguing, source of the message, or participants' communicating disagreement with the correction. Focusing on theoretically meaningful factors could help to better isolate the potential mechanisms behind backfire effects or the continued influence effect in general. Further-more, it would be beneficial to be aware of other competing factors to avoid confounds. For example, when investigating the effects of familiarity, one could avoid exclusively using issues presumed to elicit worldview backfire effects (e.g., vaccines, Skurnik et al., 2007). Additionally, given that responses to corrections are likely heterogeneous, it would be beneficial to use a wide variety of issues in experiments that vary on theoretically meaningful criteria to dissociate when backfire effects occur and when they do not.
Future research should also empirically investigate common recommendations that stem from the familiarity backfire effect notion which have yet to be thoroughly examined. For example, it is unclear whether belief updating is fostered by presenting a "truth sandwich" to participants, stating the truth twice with the falsehood between (Sullivan, 2018). Preliminary findings suggest that a "bottom-loaded" correction, which first states the misconception followed by two factual statements, could be more effective than the truth sandwich (Anderson, Horton, & Rapp, 2019), although further research is required prior to firm recommendations being made.
Finally, there are additional occasions where corrections could be counter-productive that require empirical investigation. For instance, correcting facts in public political debate might not always be advisable, because it involves the acceptance of someone else's framing, allowing the person who promulgated the original falsehood to set the agenda (Lakoff, 2010;Lewandowsky, Ecker, & Cook, 2017). Furthermore, broadcasting a correction where few people believe in the misconception could be a legitimate concern, since the correction may spread the misinformation to new audiences (Kwan, 2019;Schwarz et al., 2016). For example, if the BBC widely publicized a correction to a misconception that its readership never believed to begin with, it will not reap the benefits of belief reduction, and those who do not trust this source may question its conclusion. The next crucial step is to examine such instances with real-world scenarios on social media or fact-checking websites.

Conclusion
In today's fast-paced information society it is extremely important to understand the efficacy of corrections, the exact circumstances under which they have no impact, or even backfire. The current state of the literature calls into question the notion of the backfire effect and more rigorous studies are needed to determine if there are circumstances when these effects reliably occur. Indeed, given the current coronavirus pandemic and the rampant misinformation that has accompanied it, understanding the parameters of misinformation correction is particularly crucial. In sum, the current review suggests that backfire effects are not a robust empirical phenomenon, and more reliable measures, powerful designs, and stronger links between experimental design and theory could greatly help move the field ahead.

Author Contributions
BST conceived of the idea and wrote the majority of the manuscript. JD contributed sections to the manuscript, particu-larly the measurement and design considerations. BST, JD, and DL edited the manuscript.

Conflict of Interest Statement
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments
This project was funded by a Hewlett Packard Foundation grant. All authors acknowledge that the material presented in this manuscript has not been previously published and is not under consideration by any other journal. We thank Nicholas Miklaucic for proofreading and verifying the information in Appendices A, B, and C.

Appendix A. Papers frequently cited in support of backfire effects that do not meet the criteria of a backfire effect
Article Type of backfire Reason for exclusion Skurnik et al. (2005) Familiarity This study did not meet the criteria of a backfire effect since post-correction belief is not compared to a pre-correction or no-correction baseline. They considered a backfire to be misremembering more false items as true than true items to be false. For a critique, see .
Weaver, Garcia, Schwarz, & Miller (2007) Familiarity This study is regarding the illusory truth effect, but since it does not provide corrections to participants it cannot comment on the backfire effect.
Prasad et al.

Worldview
This study consisted of qualitative interviews and belief change was not measured. Peter and Koch (2016) Familiarity This study did not meet the criteria of a backfire effect since post-correction belief is not compared to a pre-correction or no-correction baseline. They considered the backfire to be misremembering more false items as true than true items as false.
Holman and Lay Note. Dependent variables and studies that report backfire effects are highlighted in gray. We also highly recommend viewing Guess and Coppock (2018), although the findings are omitted from this table because they presented counter-attitudinal information rather than corrective information to participants. a Trevors et al. (2016) found that self-concept negatively predicted attitudes after reading refutation text more than expository text, which the authors concluded as evidence of a backfire effect. We exclude the attitude dependent variable from this table because there was no between-subject control group and the pre-post attitude tests could not be compared. Note. Dependent variables and studies that report backfire effects are highlighted in gray. a In addition to intent to vaccinate, Skurnik et al. (2007) also considered the misremembering of more myths thought to be true than facts thought to be false to stem from famili arity mechanisms. We have excluded this element from the table because they do not meet the criteria of a backfire effect since it is not in comparison to a pre-correction or no-correction baseline. b Berinsky (2017)'s second experiment found the trend th at repetition of the misinformation prior to the correction made respondents less likely to reject it than if the misinformation was not repeated. However, we exclude this study since there is no pre-correction or no-correction baseline. c We exclude the first experiment from Pennycook et al. (2018) it was investigating the illusory truth effect. Since it did not provide corrections to participants it cannot comment on the backfire effect.