Statistical information is ubiquitous in nearly every area of modern life (Gigerenzer et al., 1989), and there are a proliferation of ways to communicate it in graphical terms (Kosslyn, 2006; Tufte, 1983). Arguably, one of the most common methods of communicating statistical information—particularly, measures of central tendency, such as the mean—is the bar graph. To date, the scientific study of bar graphs has focused largely on how best to draw them (Fischer, 2000; Gillan & Richman, 1994; Kosslyn, 2006; Zacks, Levy, Tversky, & Schiano, 1998), but here we focus on how they are naturally interpreted (Zacks & Tversky, 1999).

A mean value is computed via a process that is symmetric, in the sense that it will be influenced to the same degree by values that are located at equal distances above and below the mean. When a mean is depicted in a bar graph, however, that depiction is asymmetric: The mean itself is represented by the farthest edge of a bar that originates from one particular axis, most typically below the mean. The experiments reported here assessed whether this graphical asymmetry gives rise to a corresponding cognitive asymmetry.

Previous psychologists have stressed that what matters most for graph and diagram comprehension is not just the amount of ink on the page (cf. Tufte, 1983), but rather how the mind represents that ink in terms its own natural units (Chabris & Kosslyn, 2005). In the case of bar graphs, those units are likely visual objects, which in turn automatically, and even irresistibly, constrain the allocation of attention and memory (for a review, see Scholl, 2001). Visual objects, in particular, are defined by the closure of their boundaries, such that attention is attracted to such objects (Kimchi, Yeshurun, & Cohen-Savransky, 2007)—here, the bars themselves—but then does not flow beyond those boundaries (Egly, Driver, & Rafal, 1994; Marino & Scholl, 2005).

This may lead people who view bar graphs to reflexively attend to the bars, and so to mistakenly prioritize regions within the bars over equivalent regions outside the bars, even when this is not justified. Thus, when viewers are shown a bar graph that depicts a mean and are then asked to judge the likelihood that a particular value was part of its underlying distribution, they will judge points that fall within the bar as being more likely than points equidistant from the mean, but outside the bar—as if the bar somehow “contained” the relevant data. We tested this possibility in a series of six experiments.

Overview of the experiments

The basic designs of the experiments were similar. Participants were presented with a bar graph containing a bar that depicted a mean value M of a measurement from some distribution D. (In Exps. 1 and 2, the bar was drawn in the context of an additional bar depicting another mean value to which M was being compared, and likelihood ratings for the points falling within and outside of the two bars were then tested equally often across participants. Exps. 35 tested only a single bar depicting a mean value of zero.) On a separate page or screen, participants were then shown a single test value T depicted on the same graph axes (with the bar itself either remaining or removed, depending on the particular experiment), and they were asked to judge the likelihood, on a 9-point scale (from 1 = very unlikely to 9 = very likely), that D contained this point. The primary comparison in Experiments 15 involved the reported likelihood values for pairs of possible values for T that were equidistant from M (one above and one below). The position of T was always varied between participants, such that each participant only ever made a single judgment (except for Exp. 5, in which we examined the resulting bias in a within-subjects design).

Across several contexts, viewers did indeed judge points that fell within the bar as being more likely than points equidistant from the mean, but outside the bar. This misinterpretation occurred (a) for graphs with and without error bars, (b) for bars that originated from both lower and upper axes, (c) for test points with equally extreme numeric labels, (d) both from memory (when the bar was no longer visible) and in online perception (while the bar was visible during the judgment), (e) both within and between subjects, and (f) in populations including college students, adults, and online samples. In Experiment 6, we further explored how this bias can influence downstream decision making. We conclude by suggesting that this bias may have a real-world impact in several contexts of everyday life.

Experiment 1: Bars from below

We first demonstrated the “within-the-bar” bias for typical bar graphs (with the bars rising from a lower horizontal axis) via a paper-and-pencil test.

Method

A group of 76 undergraduates completed a pencil-and-paper survey packet. The undergraduates were recruited and tested on campus and were compensated with a chocolate bar.

The participants first observed a vertical bar graph with two bars originating from the lower, x-axis. The bars depicted the mean percentages of residents who (at the time that the study was conducted) were classified as obese in the eastern versus the western United States.Footnote 1 Each graph contained bidirectional error bars to emphasize that values from the underlying distributions could come from both above and below the means.

The actual means depicted by the bar graphs were the same across all conditions, and the data points appeared in the same spatial position on the page in all conditions. However, between participants, the y-axis labels were adjusted so that, for half of the participants, the bars depicted data points that were above the midpoint of the y-axis (as in Fig. 1a), while for the other half of the participants, the bars depicted data points that were below the midpoint of the y-axis (as in Fig. 1b). (Note that Fig. 1 provides a visual summary of the types of graphs seen by participants in Exps. 1 and 2, rather than depicting the actual stimuli; for example, the actual stimuli in Exps. 1 and 2 included error bars.) Full materials for all of the participant groups in all experiments are included in the supplementary materials.

Fig. 1
figure 1

Individual bars depicting means with possible test points from the first two experiments. Test points were always presented in the center of the y-axis, and all conditions were tested between participants. Fig. 1a and b provide a visual summary of the types of graphs seen in Experiment 1, while Fig. 1ad provide a visual summary of the types of graphs seen in Experiment 2. These were as follows: (a) A test point that appears below the initially depicted mean, such that it would have fallen within the initially presented rising bar. (b) A test point that appears above the mean, outside the rising bar. (c) A test point that appears below the mean, outside the falling bar. (d) A test point that appears above the mean, inside the falling bar

Following a series of filler items, participants were then presented with a new page with the same graph axes and a single test point, drawn at the midpoint of the y-axis (as in Fig. 1). Participants were asked to judge the likelihood (on a 9-point scale) that this value was part of the distribution that had been depicted by the bar seen earlier. Because the y-axis labels were adjusted across conditions, this test point was below the mean for half of the observers, and thus was “within” the bars that they had seen earlier. For the other half of the observers, this test point was above the mean, and thus was “outside” the bars that they had seen earlier. Likelihood ratings were tested equally often for the “eastern states” and “western states” categories. Whether the test point appeared above the eastern or western states’ x-axis labels, as well as whether the test point was within or outside of the bar, varied across participants, such that each participant only ever made a single likelihood judgment.

Results and discussion

An independent-samples, two-tailed t test revealed that although the test point was equidistant from the initially depicted means, it was judged to be more likely to have come from the distribution when it was below the mean (such that it would have fallen within the bar; M = 5.05, SD = 2.97) than when it was above the mean (such that it would have fallen outside the bar; M = 3.00, SD = 2.87), t(74) = 3.06, p = .003. We dub this effect the within-the-bar bias.

Experiment 2: Rising versus falling bars

The results from Experiment 1 suggested that people do in fact judge points that fall within the bar to be more likely than points equidistant from the mean, but outside of the bar. However, a within-the-bar bias would also arise if viewers have a more general bias to favor points that are numerically below the mean. To unconfound these possibilities, we replicated Experiment 1 while adding an additional factor. In this study, participants either made likelihood judgments for bars ascending from a lower x-axis, with increasingly extreme positive numbers running up the y-axis (“rising” bars), or for bars that descended from an upper x-axis, with increasingly extreme negative numbers running down the y-axis (“falling” bars). If the patterns observed in the previous study resulted from a within-the-bar bias, the bias should exist regardless of whether the bars are rising (and the point within the bar is numerically smaller than the mean value) or falling (and the point within the bar is numerically greater than the mean value). If, however, the pattern results from a more general bias to favor points below the mean, test points that are numerically smaller should be judged as being more likely throughout.

Method

This study tested a new sample of 161 adults (M age = 36.6). Participants were recruited while aboard a New England commuter ferry and were compensated with $2 for completing a paper-and-pencil survey packet.

The procedure was identical to that of Experiment 1, except as follows. Participants were presented with a new source of data (involving decreased housing prices in the eastern vs. the western United States) that could be naturally depicted by both rising and falling bars.Footnote 2 The rising-bar conditions were identical to the graphs in Experiment 1. In the falling-bar conditions, the test point could either fall below the mean (now outside of the bar, as in Fig. 1c) or above the mean (inside the bar, as in Fig. 1d). The positions of the initially depicted bars were again manipulated between participants by adjusting the y-axis labels.

Results and discussion

We conducted a 2 × 2 ANOVA with Test Point (within vs. outside the bar) and Graph Type (rising vs. falling bars) as factors. As was observed in the previous study, test points that originated from within the bar again produced greater likelihood ratings than did those that originated an equal distance outside the bar, F(1, 157) = 8.66, p = .004. This effect held for both rising bars [Ms = 5.63 and 4.08, respectively; t(73) = 2.09, p = .04] and falling bars [Ms = 6.24 and 4.96, respectively; t(84) = 2.06, p = .043]. (Neither the absolute ratings nor the magnitudes of the within-the-bar bias differed by rising vs. falling bars, Fs < 1.) The within-the-bar bias is especially striking here, since the particular points were functionally reversed in these two sets of conditions—so that, for example, the very same mean was within the bar for one graph but outside the bar for the other graph.

Experiment 3: Numeric extremity?

The results of the previous two studies were consistent with the predicted within-the-bar bias. However, a second alternative explanation could be that participants assigned higher likelihood values to less extreme numeric labels. (As can be appreciated from the supplementary materials, the within-the-bar points in Exps. 1 and 2 necessarily had less extreme numeric labels than did the outside-the-bar points.) To show that the within-the-bar bias does not simply reflect a tendency to favor points with less extreme numeric labels, we replicated the effect in a situation in which the paired test points were equally extreme, centered around a mean of zero.

Method

A new sample of 236 adults (M age = 35.6) were recruited via a Yale website that hosts academic experiments, and they were compensated via entrance into lotteries for gift certificates to a web-based retailer.

This experiment differed from the previous studies in the following ways: (a) The y-axis labels were identical across all conditions; (b) the graphs contained only a single bar with no error bar (to minimize memory load); and (c) the graph depicted a new hypothetical source of the data (freezing points of chemicals in a science class) that was amenable to both positive and negative numbers centered at zero (with potential data points that could fall either above or below the mean of zero).

The instructions explained that the freezing points of 20 different unknown chemicals had been recorded. The mean freezing temperature for all 20 chemicals was zero. Thus, in all conditions the bar depicted a mean of zero. As in Experiment 2, across conditions this mean was depicted either with a “rising” bar (a bar ascending from a lower axis), or a “falling” bar (a bar descending from an upper axis).

On a subsequent page, participants were presented with the same axes and a single test point drawn either above (+5) or below (–5) the mean of zero. Thus, the probe value occurred at different points, depending on the condition. Specifically, the test point could fall above the mean and outside a rising bar (as in the upper point in Fig. 2a), below the mean and inside a rising bar (the lower point in Fig. 2a), above the mean and inside a falling bar (the upper point in Fig. 2b), or below the mean and outside a falling bar (the lower point in Fig. 2b). Each participant only ever made a single likelihood judgment.

Fig. 2
figure 2

Individual bars depicting means with possible test points, from Experiments 3 and 4. Bars were always centered at zero, with test points that were equidistant and had equally extreme numeric labels. These were as follows: (a) Possible test points for the rising graph. (b) The same possible test points for the falling graph, reversing which point fell “within” the initially presented bar

Results and discussion

We conducted a 2 × 2 ANOVA with Test Point (within vs. outside the bar) and Graph Type (rising vs. falling) as factors. As in the previous experiments, the results indicated that test points that originated from within the bar produced greater likelihood ratings than did those that originated from outside the bar, F(1, 232) = 26.37, p < .001. In this experiment (and in Exp. 4), we then compared the reported likelihood ratings for a given test point (e.g., +5) across rising and falling bars, thereby controlling for any differences based on the absolute spatial position of the test point on the figure. The results indicated that the test point above the mean (+5) was judged to be significantly more likely when the bar was falling to zero and contained the point (M = 7.17, SD = 2.43) than when the bar was rising to zero and did not contain the point (M = 5.68, SD = 3.39), t(117) = 2.76, p = .007. Conversely, the test point below the mean (–5) was judged to be significantly more likely when bar was rising to zero and contained the point (M = 6.83, SD = 2.36) than when the bar was falling to zero and did not contain the point (M = 4.42, SD = 3.30), t(115) = 4.52, p < .001. [The absolute ratings did not differ by rising vs. falling bars, F(1, 232) = 1.46, p = .23, though the magnitude of the within-the-bar bias was larger for falling bars, F(1, 232) = 4.42, p = .04. Note that this asymmetry is in the opposite direction from that reported in Exp. 2; it is unclear what to make of this, however, given that the difference was exceedingly small and that the interaction term was not significant in Exp. 2.] These results confirm that the within-the-bar bias cannot be explained in terms of the extremity of the numeric labels associated with the within-the-bar versus outside-the-bar comparison, since the effect persisted here when these values were equated.

Experiment 4: Free viewing

In each of the previous three experiments, the within-the-bar effect was replicated when participants had to recall the bar graph(s) that they had seen earlier in order to report the likelihood values for the given test point. Given how robust this bias was, we wondered whether it would persist when the bar itself remained visible during the test—that is, so that there was no memory component at all. We tested this possibility here by assessing the within-the-bar bias during free viewing.

Method

We presented a new group of 259 online participants (M age = 35.0) with stimuli that were nearly identical to those of the previous experiment. In the new experiment, however, we asked about the likelihood of a test value that was either below the mean (–5) or above the mean (+5), while the graph was still present and in plain view. In this experiment, the test point was not visually depicted. Rather, participants responded to the following question: “What is the likelihood that one of the chemicals tested in Mrs. Meade’s class had a freezing point of 5º [or –5º] Fahrenheit?”

Results and discussion

We conducted a 2 × 2 ANOVA with Test Point (within vs. outside the bar) and Graph Type (rising vs. falling) as factors. As in the previous experiments, the results indicated that test points originating from within the bar produced greater likelihood ratings than did those originating from outside the bar, F(1, 255) = 8.24, p = .004. Specifically, the test value above the mean (+5) was judged to be significantly more likely when the bar was falling to zero and contained the point (M = 6.62, SD = 1.96) than when the bar was rising to zero and did not contain the point (M = 5.77, SD = 2.74), t(131) = 2.09, p = .039. Conversely, the test value below the mean (–5) was judged to be more likely when bar was rising to zero and contained the value (M = 7.14, SD = 1.60) than when the bar was falling to zero and did not contain the value (M = 6.42, SD = 2.40), t(124) = 1.99, p = .049. [The absolute ratings did not differ by rising vs. falling bars, F < 1, though the magnitude of the within-the-bar bias was larger for rising bars, F(1, 255) = 4.54, p = .03.] This result confirms that the within-the-bar bias persists even in the case in which the bar itself is fully visible during free viewing (though, understandably, with a smaller magnitude, given that the reference points were in plain view).

Experiment 5: The bias within subjects

The previous experiments were all conducted between subjects, such that each individual participant only ever gave a single likelihood rating. We did this because we assumed that participants who had to produce both ratings (i.e., for points both within and outside of the bar) would be explicitly confronted with their seeming equivalence, and so would be more likely to produce equivalent responses in an attempt to be consistent, even if this conflicted with their initial intuitions. Given the strength of the previous tests of the within-the-bar bias, though, we decided to test this assumption empirically. The participants in this study were thus asked to report likelihood ratings for the points both above and below the mean.

Method

A new group of 201 participants (M age = 36.3), recruited from the same online panel, were exposed to stimuli that were nearly identical to those from Experiment 4. However, instead of varying the test point across participants, all participants reported likelihood ratings for both test points (+5 and –5), in counterbalanced order.

Results and discussion

The results from this study indicated that test values within the bar were judged as significantly more likely than test values outside the bar, F(1, 197) = 23.90, p < .001. Specifically, when the bars were “rising,” participants judged the test value below the mean (M = 7.07, SD = 1.89) to be significantly more likely than the test value above the mean (M = 6.35, SD = 2.48), t(98) = 3.37, p = .001. Conversely, when the bars were “falling,” participants judged the test value above the mean (M = 6.78, SD = 2.06) to be significantly more likely than the test value below the mean (M = 5.82, SD = 2.75), t(101) = 3.62, p < .001. (The absolute ratings did not differ on the basis of whether the positive vs. negative point was tested first, and this factor did not interact with the within-the-bar bias, both Fs < 1.)

This replication confirms that the within-the-bar bias is powerful enough to override even a task demand that makes it explicit that the two tested points are equivalent in terms of distance from the depicted mean. (In fact, 73% of the participants did rate the points as equally likely in this study. But the comparisons reported above nevertheless persisted, because the other 27% of participants—i.e., those in the minority who rated the two points differently—still reliably favored the point within the bar.)

Experiment 6: Downstream influences on decision making

We began this study by noting that bar graphs are ubiquitous not only in scientific communication but also in the popular press. Given that people frequently make decisions on the basis of such information, this would seem to suggest that the within-the-bar bias should have downstream effects on decision making about the substance of the depicted data. Previous research has examined the effects of graphical displays on risk perception, but these studies have to date focused on the demonstration that the use of pictorial graphs at all can increase risk avoidance relative to numerical information (Chua, Yates, & Shah, 2006; Stone et al., 2003; Stone, Yates, & Parker, 1997). Here, in contrast, we sought to determine whether the specific way in which bar graphs are naturally interpreted—that is, the within-the-bar bias—can also influence subsequent decision making in ways that could matter.

Method

A new group of 270 online participants (M age = 37.3) were asked to imagine a hypothetical scenario in which they were the CEO of a large tire manufacturer (as detailed in the supplementary materials). The scenario explained that the tire manufacturer had released a new type of tire and was performing a test of the tire’s safety (quantified in terms of a hypothetical measure, “BTS”). Participants were told that the ideal BTS score was zero and that a test of 30 tires had been performed, yielding a mean BTS score of zero. However, it was noted that some of the tires tested had BTS scores greater than zero, while others had scores less than zero.

Participants then viewed a bar graph, which depicted the mean of zero. This mean was depicted graphically via a bar rising from a lower x-axis (in the rising-bar condition), or by a bar descending from an upper x-axis (in the falling-bar condition). In a third, control condition, we presented only the written information and no bar graph.

The participants then read the following statement: “Based on this information, you can recall the existing tires (which will be costly) to either slightly increase the BTS of your new tires or slightly decrease the BTS of your new tires. In this case, I would . . .” Participants responded using a slider bar ranging from slightly decrease BTS levels to slightly increase BTS levels, with a midpoint labeled neither increase nor decrease BTS levels. The numeric values associated with these points were 0, 703, and 351, respectively (though these numerical values were not seen by participants).

Results and discussion

The results indicated that the presence of bar graphs influenced subsequent decision making in a manner consistent with the within-the-bar bias. A one-way ANOVA revealed a main effect of condition, F(2, 267) = 13.17, p < .001. Directional tests revealed that the presence of a rising bar led participants in this condition (M = 380.18, SD = 95.41) to significantly increase BTS levels relative to control (M = 354.80, SD = 83.03), t(177) = 1.90, p = .029. Conversely, the presence of a falling bar led participants in this condition (M = 306.48, SD = 113.40) to significantly decrease BTS levels relative to control, t(178) = 3.26, p < .001—and a further comparison confirmed that the resulting BTS levels differed between the rising- and falling-bar graph conditions, t(179) = 4.73, p < .001. The control condition was not significantly different from the midpoint of 351 (p = .67), while both the rising- and falling-bar conditions were (both ps < .01).

Thus, although the objective information provided to participants in this task gave them no reason to either increase or decrease BTS levels (a fact that the majority of participants in the control condition appeared to recognize), the presence of the bar graphs influenced participants’ decision making such that a rising bar caused participants to increase BTS levels, while a falling bar caused participants to decrease BTS levels. This confirms that the influence of the within-the-bar bias is not limited to reasoning about bar graphs and statistical distributions per se, but can also influence decisions made about the content that the bar graphs depict.

General discussion

The graphical misinterpretation documented here—what we refer to as the within-the-bar bias—held across wide variations in materials, display details, and subject populations (including data from over 1,200 participants in total). This bias itself is an inferential phenomenon, but it could readily influence a number of situations that matter, as when bar graphs depict means related to safety, economic, or medical information (as seen in Exp. 6).

Given the prevalence of bar graphs in both the media and scientific communication, the results presented here raise important questions about how bar graphs are naturally interpreted and the extent to which people may draw incorrect inferences about the nature of data that are presented via bar graphs. Bar graphs of means will doubtless remain ubiquitous, but in some situations means might be better represented with points rather than asymmetric bars—or, where possible, with depictions of the distributions themselves. We thus encourage researchers (and others) to consider these additional options for reports of central tendencies, and to use bar graphs only when the underlying values being depicted are inherently asymmetric (e.g., when depicting counts, ranges, or measures of extremity). This would constitute a sea change in how information is graphically depicted across many fields, but it would help to bring such tools into better alignment with the ways in which our minds automatically process and attend to graphs as visual objects.