How Many People Live in Political Bubbles on Social Media? Evidence From Linked Survey and Twitter Data

A major point of debate in the study of the Internet and politics is the extent to which social media platforms encourage citizens to inhabit online “bubbles” or “echo chambers,” exposed primarily to ideologically congenial political information. To investigate this question, we link a representative survey of Americans with data from respondents’ public Twitter accounts (N = 1,496). We then quantify the ideological distributions of users’ online political and media environments by merging validated estimates of user ideology with the full set of accounts followed by our survey respondents (N = 642,345) and the available tweets posted by those accounts (N ~ 1.2 billion). We study the extent to which liberals and conservatives encounter counter-attitudinal messages in two distinct ways: (a) by the accounts they follow and (b) by the tweets they receive from those accounts, either directly or indirectly (via retweets). More than a third of respondents do not follow any media sources, but among those who do, we find a substantial amount of overlap (51%) in the ideological distributions of accounts followed by users on opposite ends of the political spectrum. At the same time, however, we find asymmetries in individuals’ willingness to venture into cross-cutting spaces, with conservatives more likely to follow media and political accounts classified as left-leaning than the reverse. Finally, we argue that such choices are likely tempered by online news watching behavior.


Introduction
Does social media encourage the public to live in online ideological "echo chambers," consuming and sharing only information that is consistent with their political beliefs? This question has received increasing attention by political commentators and politicians. Some have claimed, for example, that such ideological echo chambers on social media are "destroying democracy" (El-Bermawy, 2016); explain "why Trump won and [we] didn't see it coming" (Baer, 2016); and create "angry, poorly informed partisans" (Lee, 2016). Former President Barack Obama has suggested, furthermore, that ideological echo chambers are now a critical issue for democracy. "One of the dangers of the Internet is that people can have entirely different realities," he warned. "They can be cocooned in information that reinforces their current biases" (Yeginsu, 2017). Much of the public discourse concerning social media appears to suggest a belief that online ideological echo chambers are both highly pervasive and deeply problematic for society.
In this article, we measure the ideological distribution of both Twitter accounts followed and tweets potentially seen at the individual level. We analyze data from a nationally representative survey of Americans with linked data on respondents' Twitter IDs, which allows us to collect the set of accounts that they followed and all tweets posted by those accounts. We quantify how many respondents live in online ideological "bubbles" based on their own self-reported ideology. We find a substantial amount of overlap in the ideological distributions of accounts followed by users on opposite ends of the political spectrum. In addition to this relative similarity in overall following patterns, however, 832705S GOXXX10.1177/2158244019832705SAGE OpenEady et al.

research-article20192019
1 New York University, New York City, USA 2 Princeton University, NJ, USA many individuals' willingness to purposefully venture into challenging spaces is limited, although an analysis encompassing all potentially seen tweets shows approximately twice as much cross-cutting exposure-an effect that is somewhat larger when focusing on retweets from other accounts. The individual-level pattern also appears to be asymmetric: When compared with fixed points on the ideological continuum, conservatives are more likely to follow accounts at or to the left of MSNBC than liberals are to follow accounts at or to the right of Fox News, even though the measure of ideological slant we employ places the two outlets at roughly equidistant positions from the midpoint.
For social scientists, the connection between social media and the prevalence of online ideological bubbles is not clear either theoretically or empirically. 1 The arguments concerning the role of the Internet and social media in the consumption of ideologically diverse content can lead to hypotheses that run in opposite directions, predicting both the presence and absence of ideological bubbles. News media in the United States during the last half-century has itself become increasingly ideologically diverse. Before the advent of the Internet, the news media landscape underwent a rapid shift from having three ideologically similar networks to a landscape with a wider range of diverse cable networks, some without any news content at all. This gave individuals the choice to opt in to news with a distinct ideological slant, or to opt out entirely (Prior, 2007). The adoption of broadband Internet and social media has led the news and political information landscape to be diversified further, with increased access to heterogeneous political information sources now available to the vast majority of the public. These changes have democratized both the production and consumption of political and social information (Benkler, 2006) and increased opportunities for the public to consume ideologically heterogeneous political information.
In addition to increasing choice, the advent of social media may also have led to an increase in incidental exposure to ideological heterogeneous information (e.g., Barberá, 2015b;Brundidge, 2010;Feezell, 2018;Fletcher & Nielsen, 2018;Messing & Westwood, 2014). Although many people may not explicitly seek to consume such content, the sharing mechanism of social networks such as Facebook and Twitter may result in many among the public being exposed to political information they did not seek out, including information that is not consistent with their own ideological predilections. Thus the structure of social media could indirectly prevent ideological bubbles from forming. As a consequence, the relative ease of access to and abundant supply of ideologically diverse information sources, and the potential for indirect exposure to such content, lead to the hypothesis that relatively few among the public will inhabit ideological echo chambers online.
On the contrary, while the above democratizing account of the Internet and social media is primarily a story of information supply, the story regarding the formation of ideological bubbles concerns information demand. Despite the relatively widespread availability of ideologically diverse information, individuals may nevertheless choose to selectively expose themselves only to material and individuals who are ideologically similar. Such an account is grounded, first, in the well-known empirical regularity of homophily, the tendency of individuals to associate with others who are similar to themselves (see McPherson, Smith-Lovin, & Cook, 2001), and second, by work demonstrating that the public frequently engages in selective exposure, the tendency to consume information that ideologically aligns with one's own political beliefs (see Stroud, 2010). Under this account, one might hypothesize that many among the public would be exposed only to an ideologically narrow range of content, a consequence both of the individuals they associate with online and the material they choose to consume.
To investigate these hypotheses, we build on recent work concerning online selective exposure (e.g., Bakshy, Messing, & Adamic, 2015;Flaxman, Goel, & Rao, 2016;Gentzkow & Shapiro, 2011;Guess, 2018;Messing & Westwood, 2014) by examining the degree to which exposure to diverse opinions and content on social media varies among individuals across the ideological spectrum. To do so, we use a large survey of social media users with Twitter accounts. Our focuses are threefold. First, we examine the ideological distributions of the accounts that users follow, disaggregated by whether such accounts are (a) members of the news media, (b) part of the political class, or (c) non-elites (the general public). Second, we examine the ideological distributions of the tweets that users receive, overall and from each of these account types. Finally, we examine whether the retweets that users receive originate from accounts that are more ideologically diverse or more ideologically moderate than the tweets that users receive from those they follow directly. This permits us to examine the extent to which indirect exposure to people outside of one's immediate network serves as a mechanism to expand the diversity of the opinions and media consumed on social media.
Our goal in this article is to accurately describe the social media consumption of a set of respondents, grouped by ideology. We do so at a granular level, presenting not only aggregate statistics, but providing a sense of the distribution and variety of online media consumption among individuals. Our aim is to identify how many people live in "bubbles"-and, in particular, how many people on the ideological poles live in bubbles. This has implications for one of the major political issues of our time: political polarization. If those on the left and right of center not only dwell in ideological cocoons but are unaware of what the other side sees, the concern is that those on opposite sides of the political spectrum will continue to misunderstand each other and rely on crude stereotypes, leading to further cycles of negative partisanship.

Data and Measurement
To measure ideological exposure on Twitter, we use data from a representative sample of Twitter users who are located in the United States and who were surveyed by the research firm YouGov during the 2016 U.S. election campaign. 2 For each respondent for whom we have data on accounts followed ( N = 1, 496 ; 642,345 total unique accounts), we collected the most recent 3,200 tweets sent by each of those accounts (~1.2 billion tweets in total). 3 In sum, our data contain all accounts followed and the most recent tweets potentially seen by our YouGov sample during and prior to 2016.
To examine the relationship between the ideology of Twitter users and the amount of exposure to ideologically diverse content, we need to measure both the ideology of our survey respondents and that of the accounts respondents follow and from whom they receive tweets. To measure respondent ideology, survey respondents were asked to locate themselves on an integer-valued ideological placement scale that ranged from 0 (liberal) to 100 (conservative). To ease presentation, we group respondents into ideological quintiles. This permits simpler interpretations of comparisons both between and within ideological categories. However, because we group by quintile, it is important to note that we are referring to each group's relative ideological rank among all Twitter users in the sample (ranges of the ideological selfplacement variable that define these quintiles are presented in the Appendix). We also supplement our analysis by examining the left-and right-most 5% of the self-placement distribution to investigate behavior on the ideological extremes.
To measure the political ideology of the accounts that the YouGov respondents follow and the tweets that they receive, we use estimates derived from the method developed by Barberá (2015a). The method is based on the assumption that Twitter users signal their political ideology by the accounts of media organizations, journalists, and politicians that they follow on Twitter. Users who follow primarily conservative politicians and conservative news media, for example, are assumed to be more likely to themselves be conservative than others who follow more ideologically moderate or liberal political actors. This assumption follows from the fact that individuals tend to associate with those who are similar to themselves, a regularity often referred to as homophily (McPherson et al., 2001). More technically, the model is similar to an item-response model in which the probability that a user follows a political actor's Twitter account is a function of the latent (ideological) spatial distance between the user and that political actor. The method has been used in recent work to examine selective exposure (Barberá, Jost, Nagler, Tucker, & Bonneau, 2015;Vaccari et al., 2016), to compare ideological estimates of political actors derived from social media to those from other data sources (Tausanovitch & Warshaw, 2017), and to examine the link between journalists' social media networks and the content they produce (Wihbey, Coleman, Joseph, & Lazer, 2017).
Given the size of the Twitter data collected, for computational reasons our ideology estimates are approximated using correspondence analysis (Barberá et al., 2015;Greenacre, 2007).
In addition to aggregate comparisons of the accounts followed and the tweets received by survey respondents, we further categorize accounts that are followed into three types: (a) media elite, (b) political elite, and (c) non-elite (i.e., the mass public). Accounts defined as the media and political elite include the accounts of major media organizations, journalists, and politicians in the United States. "Non-elites" are therefore defined as the set of all Twitter accounts not included among the list of media and politicians. 4

Results
We begin by examining the number of politicians and media accounts that respondents follow on Twitter, both overall and by ideological quintile. If the average Twitter user in the United States follows very few or no media and/or political accounts, we might conclude that few reside in ideological bubbles simply because many choose not to use Twitter to receive political news at all. 5 In Table 1, we show the proportions of respondents who follow a given number of media and political accounts. As the table demonstrates, over one third of respondents (40%) follow no media accounts at all; over half (53%), no political accounts. There appears, however, to be a relatively large proportion of respondents who follow many of each: 18% of respondents follow 11 or more media accounts and 7% follow 11 or more political elites. To the extent that one third of Twitter users are not following any media accounts, we know that those users will only be in a "bubble" online if many of their friends share political information with them. We also examine whether stronger partisans are more likely to follow more media and political accounts than moderates by calculating the average number of accounts followed among those in each ideological quintile. Results are presented in Table 2. As expected, the average number of media accounts and political accounts followed by those on the ideological extremes is greater than those in the three middle quintiles. In other words, those on the ideological extreme are more politically engaged in their following behavior on Twitter than are moderates.
Next, we examine the ideological distribution of media accounts that are followed by our respondents, and compare this distribution to the ideological distribution of all media accounts. This comparison permits us to examine the extent to which users follow media accounts across the ideological spectrum proportional to the availability of media accounts across that spectrum. To investigate this, we show in Figure  1 the ideological distribution of all media accounts and the media accounts followed by our respondents weighted by the frequency by which they are followed. Well-known media accounts are labeled to provide context to the distribution. As the figure shows, accounts around and to the right of Fox News are overrepresented in the distribution of media accounts followed relative to the ideological distribution of all media accounts available, and the distribution of media followed appears bimodal. 6 We now examine whether and to what degree the following of media accounts varies by respondents' ideology. If all liberals were to follow only liberal media sources, and all conservatives were to follow only conservative media sources, then it would constitute clear evidence that people generally reside in ideological echo chambers. To investigate this, we graph the distributions of media accounts followed by respondents in each ideological quintile in Figure 2. The figures show that, as one would expect, respondents in the most liberal quintile follow a more liberal set of media accounts than do respondents in the most conservative quintile. There is, however, substantial overlap in the ideology of media accounts followed across quintiles. Each group, for instance, follows media accounts both to the right and left of the New York Times, though this region is much smaller for the most conservative quintile. Furthermore, each ideological group's distribution covers considerable area bounded by MSNBC on the left and the Wall Street Journal on the right. However, we also observe that the two most conservative quintiles have distributions with two modes: one in this common (mainstream) center and another mode on the right between Fox News and Breitbart. We examine this feature at the individual level further below.
For comparison, in Figure 3 we plot the ideological distribution of politicians followed by respondents in each quintile. Here we see much clearer separation: Respondents in the two right-and left-most quintiles clearly choose to follow politicians who are to the left or right of the zero point, and in fact they predominantly follow politicians to the left of Clinton (lower quintiles) and to the right of Trump (upper quintiles). We also present, in Figure 4, the ideological distribution of non-elite accounts followed by respondents in each quintile. For respondents in the most conservative quintile, the distribution of non-elite accounts is similar to the Note. Cell entries represent the percentage of respondents following the column number of media or political accounts. Note. Cell entries give the mean number of media and political accounts followed by respondents in each ideological quintile.
distribution of political accounts, with two modes and the bulk of accounts followed to the right of Trump. One point to be made by comparing these three figures is that following of media accounts is less polarized than is following of political and non-elite accounts. This suggests that many people are likely following media sources that are more moderate on average than the friends or politicians they follow.
To present these data differently, in Table 3 we select descriptive (fixed) cut-points to provide meaningful context for the distributions of media accounts followed by respondents based on their ideological ranking. In each of the five quintiles, we give the proportion of all media accounts followed to the left or right of well-known media accounts by aggregating over all respondents. 7 In addition to the five quintiles, we also include the left-most 5% of respondents and right-most 5% of respondents. In the second row of Table 3, we show that 29% of all media accounts followed by liberals (those in the left-most quintile) are at least as far to the left as MSNBC (i.e., MSNBC or media accounts to the left of MSNBC), while only 4% of media accounts followed by liberals are at least as far to the right as Fox News. Another way of stating these results is that for the most liberal quintile, 71% of media accounts followed are to the right of MSNBC. However, virtually all of those accounts are also to the left of Fox News (the proportion of media accounts followed by this group that is as far to the right as Breitbart or beyond is 0). If we look at the second-left-most quintile of respondents, we still see only 1% of accounts followed are as far to the right as Breitbart and only 5% are as far to the right as Fox News.
When we look at the most conservative quintile of respondents, we see that an analogous share of followed accounts as above, 6%, are at least as far to the left as MSNBC. If we look at the next most conservative quintile, 9% of the media accounts respondents in that group follow are at least as far to the left as MSNBC. On the contrary, 50% of media accounts followed by those in the most conservative quintile are at least as far to the right as Fox News, suggesting greater concentration in following behavior among that group. Furthermore, 12% of media accounts followed by those in that group are at least as far to the right as Breitbart-roughly the same share as that followed by respondents in the most conservative 5%.
Next, we turn to an analysis at the individual level. We define a respondent to be in a "bubble" if he or she is a liberal who does not follow a minimal proportion of conservative media accounts, or a conservative who does not follow a minimal proportion of liberal media accounts. In Table 4, we report the proportion of respondents in each ideological group who follow accounts in each range. The threshold we use in the table is to calculate the proportion of people whose media diet includes at least 5% of followed accounts in one of the ranges identified (e.g., at least 5% of accounts followed are to the left of MSNBC). As the majority of respondents follow fewer than 20 media accounts, for most respondents this threshold simply requires them to follow at least one account in a given range. Since, as we saw, many of our respondents follow zero media accounts on Twitter, we report the proportion of respondents conditioning on following at least one media account. Note. The gray line indicates the distribution of all media accounts that users can follow. The black line indicates the distribution of media accounts actually followed by survey respondents (weighted by the frequency by which they are followed by respondents).
If we look at the second row of Table 4, we see that 78% of those among the most liberal quintile of respondents who follow at least one media account have media diets in which at least 5% of the accounts they follow are at least as far to the left as MSNBC. More interestingly, we can see that only 1% of people in the most liberal quintile have media diets with 5% of followed accounts at least as far to the right as Breitbart, and only 16% of people in the most liberal quintile follow accounts at least as far to the right as Fox News. Thus if we think that for a liberal to never venture as far right as Fox News is to be in a bubble, 84% of respondents in the most liberal quintile are in a bubble (as are 85% in the second most liberal quintile). However, if we look at the respondents in the most conservative quintile, we see that 22% have a media diet that includes sources at least as far to the left as MSNBC. By the same logic Note. This figure presents the ideological distributions of media accounts followed by survey respondents, weighted by the frequency with which they are followed. The top panel shows the media accounts followed by all respondents; the bottom panel shows the media accounts followed within each ideological quintile.
above, this would imply that 78% of people in the most conservative quintile are in a bubble. However, if we look at the people in the right tail of the distribution (right-most 5%), we see an a higher proportion of them following accounts at least as far to the left as MSNBC: 42% of respondents in the right-most 5% of the ideological distribution follow media accounts at least as far to the left as that account.
One potential explanation for the asymmetric following behavior of liberals and conservatives shown above is that it is a function of asymmetries in the supply of and demand for news: People across the political spectrum find reason to follow the output of mainstream news organizations committed to norms of journalistic professionalism and equipped with newsrooms and reporting resources, even as these publications are perceived by some to be left-leaning Note. This figure presents the ideological distributions of political accounts followed by survey respondents, weighted by the frequency with which they are followed. The top panel shows the political accounts followed by all respondents; the bottom panel shows the political accounts followed within each ideological quintile. Note. This figure presents the ideological distributions of non-elite accounts followed by survey respondents, weighted by the frequency with which they are followed. The top panel shows the non-elite accounts followed by all respondents; the bottom panel shows the non-elite accounts followed within each ideological quintile. and are trusted more highly by those on the left of center than those on the right (see Grossmann & Hopkins, 2016). Ideologically committed conservatives may have more of a demand for ideologically congenial content in addition to mainstream news due to these perceptions, while liberals are more likely to be satisfied with traditional sources of journalism.
If true, we should see a more symmetric distribution when we look at the accounts of politicians followed. Thus in Table 5, we look at the set of politicians respondents follow, and see how many respondents follow politicians with viewpoints likely to be different than their own. Here we adopt politicians (Bernie Sanders, Hillary Clinton, Donald Trump, and Ted Cruz) as cut-points rather than media outlets. We see that most liberals still do not venture right: Only 16% of those in the left-most quintile who follow any politicians follow one such account at least as far to the right as Donald Trump, and only 8% follow a politician at least as far to the right as Ted Cruz. And we again see that conservatives are more likely to go left than liberals are to go right: Of respondents in the most conservative quintile who follow at least one politician, 35% follow a politician at least as far to the left as Hillary Clinton. Thus, our earlier result does not seem to be based on extraneous factors; conservatives on Twitter seem more inclined to look left than liberals do to look right.
In Table 6, we examine the set of non-elites followed. We see that 41% of liberals follow non-elite accounts at least as far to the right as Fox News, while 66% of conservatives follow non-elite accounts at least as far to the left as MSNBC. This of course suggests that 59% of liberals are living in "bubbles" with respect to the non-elite accounts they follow. For conservatives, only 34% of them are evidently in such bubbles. Again, this is consistent with our previous results.
While following activity arguably reflects some form of conscious intent, another way to measure whether or not respondents are in bubbles is to consider the sources of tweets they actually receive. We saw in Table 4 that liberals did not follow many right-leaning media accounts. However, if we count tweets potentially seen from media accounts based on ideology, things look somewhat different. Table 7 shows the proportion of tweets from media sources that are to the left or right of well-known accounts (an analogous table for the subset of users who follow at least one media account is provided in the Appendix). Whereas in Table 4 we saw that 16% of respondents in the most liberal quintile followed accounts at least as far to the right as Fox News, here we see that 27% of those in the most liberal quintile do get some share of their Twitter news diet from tweets at least as far to the right as Fox News. And while 22% of conservatives chose to follow media accounts at least as far to the left as MSNBC, 43% get some part of their media diet of tweets from sources at least as far to the left as that source. If we instead look at the proportion of politicians' tweets in respondents' feeds (Table 8), we again see that liberals and conservatives appear to have a more balanced information diet than if we simply look at following behavior. Finally, if we look at tweets that came from non-elite accounts, we see a more balanced distribution. In Table 9, we plot proportions of respondents with tweet diets sent by nonelites falling within ranges defined by our anchor media outlets from above. These diets are more ideologically balanced than those comprising politicians or media accounts. 8 Although the difference here is not as stark as it was for media or political accounts followed versus tweets seen.

Comparing Online and Offline Media Exposure
If we want to study whether or not people are in information bubbles overall, there is no reason to restrict the content of interest to what people consume online. 9 If conservatives choose to follow nothing to the left of Breitbart on Twitter but watch the CBS Evening News every night, then they are not in ideological bubbles-they have as much exposure to mainstream news as their pre-Internet-age parents did, and are merely supplementing it with the alternative viewpoint provided by Breitbart. In the survey, we therefore asked our respondents, "Do you watch news shows on any of the following networks, and if so, how often?" Response options were arranged in a grid with Fox News, CNN, MSNBC, CBS, ABC, and NBC as the network options, and "every day," "several times a week," "at least once a week," "less than once a week," and "never" as available frequencies. 10 We then look to see what proportion of conservatives in ideological bubbles on Twitter (i.e., less than 5% of the accounts they follow are at least as far to the left as MSNBC) diversified their media diets by watching news on one of the major networks at least once a week. Above we reported that 78% of conservatives do not go as far left as MSNBC on their following behavior on Twitter. However, 14% of that group of conservatives in ideological bubbles on Twitter did report watching left-leaning TV news (MSNBC). If we take these responses at face value and assume that network television is to "the left of" conservative media, we reduce by almost a factor of two the proportion of conservatives who live in "media bubbles." However, even with this adjustment, we still have 67% of self-identified conservatives with a media diet that does not stray as far to the left as MSNBC.
If we apply a similar adjustment to our liberal respondents in ideological bubbles on Twitter (i.e., those for which less than 5% of the accounts they follow are at least as far to the right as Fox News), the effect is much smaller. Only 7% of liberal respondents who chose to be in ideological "bubbles" on Twitter report watching Fox News on television: Thus, we would revise our estimate of the number of liberals in ideological bubbles from 84% to 78%. 11 This "correction" for offline media exposure becomes more important if we had chosen our bubbles more restrictively. Say we defined a conservative to be in an ideological bubbles if fewer than 5% of the accounts they followed were to the left of the Wall Street Journal. Of the conservatives in our sample, 44% were in such conservative extreme-bubbles. However, 46% of those watched at least one of either MSNBC, CNN, ABC, CBS, or ABC. Thus if we consider a larger part of their total media diet, online and television, we would estimate that only 24% of conservatives were in extreme media bubbles.

Comparing Ideological Echo Chambers Across Individuals
Above we investigated the extent of ideological echo chambers using well-known media organizations and political actors as anchors. Because Twitter users each follow accounts and receive tweets that form unique ideological distributions, we can also use the overlap of these distributions as a more fine-grained measure of the degree to which individual users see content from ideologically similar accounts. If the ideologies of the accounts that two users follow are similar, for example, the ideological distributions of those accounts will heavily overlap; if they are ideologically distinct, such overlap will be minimal. In this way, the extent to which individuals or groups exist in ideological bubbles can be measured at the individual level by the degree to which the ideological distributions of followed users or received tweets overlap with those of others. Distributional overlap as a measure of ideological similarity has been used analogously in work examining political polarization (Lelkes, 2016;Levendusky & Pope, 2011) and to measure the degree to which the ideological slant of news media consumption differs between partisans (Guess, 2018). 12 To demonstrate the meaning of distributional overlap by illustration, we present three examples of overlapping distributions between two hypothetical social media users in Figure 5. The three panels demonstrate increasing levels of ideologically distinct content between two users, with the shaded region between distributions indicating the degree of overlap. In the first panel, the two users show highly similar ideological distributions (80% overlap); in the second panel, moderately similar ideological distributions (45% overlap); and in the final panel, extreme ideological dissimilarity (10% overlap). 13 To apply this measure to our data, we begin by calculating the mean overlap between the accounts followed and the tweets received by users within each ideological quintile. We ask, in other words, what level of overlap can we expect on average when comparing two Twitter users who identify as ideologically similar in our survey. This withinquintile average provides us with an assessment of the degree to which those with similar ideological outlooks follow ideologically similar accounts and receive tweets from ideologically similar users. In this set of analyses, we look at all accounts together including media, political, and non-elite categories. These measures also serve as natural baselines to compare the ideological overlap between ideologically dissimilar users, which we examine further below. To calculate the mean overlap between users, we randomly sample pairs of users from the same ideological quintile and calculate the degree of overlap for both: the ideological distributions of the accounts that they follow and the tweets that they receive. For each quintile, we sample 2,000 user-pairs and calculate the overlap between each. 14 In Figure 6, we then present the mean overlap of all sampled pairs within each quintile. Interestingly, the figure shows that although the mean within-group overlap between individuals in the bottom three quintiles is roughly equivalent (i.e., among the more liberal quintiles), the mean overlap between individual users' distributions in the upper two (conservative) quintiles is substantially smaller.
To investigate why this is the case, we show graphically in Figure 7 the ideological distributions of accounts followed by 15 randomly selected users in the most liberal quintile (left panel) and most conservative quintile (right panel). Differences in the distributions of the accounts users follow on the left and right are immediately apparent and suggests that the relative lack of within-quintile overlap among conservatives results from clear and substantial variation in the location and shape of distributions across individuals. While on the left the distribution of accounts followed is unimodal and similar across users, the users on the right appear to follow a mixture of two distributions. Substantively, these two distributions define what can be considered the political mainstream-at or to the left of Fox News (ideology parameter φ = 0.8)-and the relatively far right, defined by media organizations such as Breitbart and well-known right-wing pundits such as Sean Hannity (φ = 1.42 ) and Glenn Beck ( φ = 1.48). In other words, while most users on the left follow accounts and receive tweets from within a single similar  Note. This figure shows the mean ideological overlap of the distributions of all followed accounts and tweets received for respondents within each ideological quintile.
ideological grouping, users on the right appear to select from two ideological environments-one mainstream, the other more conservative-with some users selecting either one or the other, and other users selecting from both. To see this further, we present in Figure 8 the distribution of the overlap coefficient for all sampled pairs from the bottom and upper ideological quintiles. Whereas for the lower quintile, most user-pairs heavily overlap, in the upper quintile they frequently do not.
We now turn to an examination of the overlap across ideological quintiles. We follow the same procedure as above, repeatedly selecting two users at random, one from each Note. This figure presents the ideological distributions of all of the Twitter accounts followed by 15 randomly selected respondents from the lower (liberal) and upper (conservative) ideological quintiles.
quintile of interest, and calculating the distributional overlap between them. We then calculate the average overlap for each between-quintile comparison. Results from this procedure are presented in Figure 9. As the figure shows, the between-quintile overlap is surprisingly large. Even between the two most distinct quintiles (the bottom and top), the mean overlap among pairs selected from those two quintiles is 0.51 for accounts followed. In other words, if we took a random liberal respondent and a random conservative respondent, just over half of the media accounts they follow would be from the same segment of the ideological distribution. This is not dramatically smaller than the overlap we would see if drawing two random conservatives, where we would expect 57% of the accounts they follow to be ideologically congruent. Results for the tweets that respondents receive are similar. The mean overlap among pairs selected from the most liberal and most conservative quintile is 0.44, a relatively large amount of overlap between those at opposite sides of the ideological spectrum. In sum, even in comparisons between users on the ideological extremes, there is nevertheless substantial overlap between the ideological distributions of the accounts followed and the accounts whose tweets they receive.
Finally, to calculate a measure of overlap that is binary-a measure of information polarization-we calculate the probability that the distributions of any two randomly selected user-pairs are ideologically distinct by setting an arbitrary threshold for whether two users inhabit ideologically dissimilar environments. We define distributions as ideologically distinct if two users share less than one-third ideological overlap. The probabilities of two users within and across ideological quintiles sharing less than one-third overlap are  Note. This figure shows the mean overlap between user-pairs randomly selected from the indicated ideological quintiles for all accounts followed and all tweets.
shown in Figure 10. The results are substantively similar to those concerning mean overlap. As would be expected, the largest proportions of users inhabiting ideologically dissimilar environments are user-pairs from the the lower and upper ideological quintiles: 29% of such pairings follow substantially ideologically distinct sets of accounts. But of course this implies that 71% of such pairs follow sets of accounts with substantial ideological overlap.

Can Weak Ties Increase the Consumption of Ideologically Diverse Content?
Social media research has typically sought to tackle the question of whether social media discourages the consumption of content from ideologically heterogeneous sources. It could also be the case, however, that social media could encourage its consumption. One possible mechanism for this is that social media platforms can magnify the connections between individuals who share what Granovetter (1973), in a classic article, calls "weak ties" (Barberá, 2015b). Weak ties are those between individuals who lack a deep emotional, reciprocal, and/or personal relationship, and can be thought of in analogous terms to the difference between personal friends (strong ties) and acquaintances (weak ties). Where for Granovetter (1973) weak ties provide access to others who may provide opportunities for employment, on social media weak ties provide opportunities for access to ideologically heterogeneous information.
Social media can magnify the power of weak ties for two related reasons. First, it facilitates the formation of these ties by simplifying the establishment of ongoing connections between individuals who have little (or no) deep personal connection (e.g., following on Twitter and friending on Facebook). Second, social media facilitates the maintenance of these ties by providing a straightforward flow of information between users (e.g., receiving tweets from followed accounts, and posts and comments on Facebook). The ease of forming, maintaining, and thus receiving ongoing information from weak ties can be beneficial for diversifying the ideological environment because weak ties provide bridges to locations in social networks which are unfamiliar.
As evidence of the ideologically diversifying power of social media, Barberá (2015b) shows that the accounts that Twitter users initially follow are typically more ideologically extreme than the accounts that they follow later. Most Twitter users, in other words, appear to increasingly follow more ideologically moderate accounts over time. One possible reason for this is that users themselves become increasingly ideologically moderate. Social media, by providing an ideologically heterogeneous environment, in other words, could affect the ideology of its users, who consequently follow an increasingly moderate set of accounts. A second, complementary mechanism is that retweets-tweets from accounts that are not directly followed-provide an important indirect path to receive new information from a diverse set of accounts that were previously unknown, and thus a set of new, potentially more moderate accounts to be subsequently followed. Social media can thus provide a platform for ideologically diverse discovery.
To provide evidence suggestive of this second mechanism, we examine the degree to which the retweets that users receive are from a more ideologically diverse or moderate set of accounts than are the tweets that users receive from accounts that they follow directly. 15 Our empirical expectations are the following. First, we expect that the ideological distributions of the accounts seen via retweets will be wider (i.e., more diverse) than those received directly from followed users: As many users followed on Twitter will likely be weak ties, their retweets will expose followers to a more diverse set of users than those selected for following. Second, the ideologies of the set of accounts seen via retweets will be more moderate than those from the accounts that users follow, providing suggestive evidence of a mechanism by which users increasingly follow more moderate accounts over time.
To measure ideological diversity, for each user we calculate the standard deviation of the ideological distributions of Figure 10. Proportion of user-pairs within and between ideological quintiles with less than one-third overlap.
(a) the tweets received that are authored by accounts a user follows and (b) the tweets received that were retweeted by the followed accounts. We then calculate the difference in these measures, subtracting the standard deviation of the estimated ideologies of tweets by followed accounts from that of retweets. Thus, a positive value indicates a retweet distribution that is more ideologically diverse. Note that in each case the ideology of a tweet is measured by the ideology of the author of the tweet: for a tweet authored by a user who is followed, this is the ideology of the user who is followed directly; for a retweet, the ideology of the author of the tweet (i.e., the ideology of the user whose tweet is being retweeted). We calculate these differences in ideological diversity for all tweets received by each user. We also compute these differences separately for the tweets from media, politicians, and non-elites.
Results are presented in Figure 11. In all four panels we see that the bulk of the mass of all four distributions is positive, indicating that for the vast majority of respondents, the ideological distribution of retweets is more varied than that of tweets from followed accounts. Among all tweets received (the "Overall" panel), 80% of users have retweet distributions that are more ideologically diverse than those of tweets from followed accounts. This difference in ideological diversity of tweets and retweets holds, furthermore, when the tweets are disaggregated by type: 85% for politicians' tweets, 72% for media tweets, and 81% for tweets from non-elites. Among all tweets, the mean standard deviation of the ideology of tweets authored by accounts that are followed is 0.78, and for retweets it is 0.87 ( p < 0.001).
Although these patterns confirm our expectations, it is important to note that the size of this difference is not meaningfully large. To illustrate this graphically, we present the authored tweet and retweet distributions for 15 randomly selected users from the lower ideological quintile and 15 from the upper ideological quintile in Figure 12. In general, the retweet distributions are wider, but the differences are minimal. Incidental exposure, in other words, does not provide a Note. This figure shows the differences in the standard deviations of tweets seen from accounts that are directly followed by respondents and those from accounts that are retweeted. A difference greater than zero indicates a wider ideological distribution among retweeted accounts than accounts that are followed directly by respondents.
strong pathway through which users will consume material from an ideologically varied set of social media users. Instead, selection of whom respondents follow appears to mostly drive the ideological distribution of what respondents see.
Finally, we investigate whether the ideology of accounts whose retweets users see are more moderate than those of the accounts that users choose to follow. To do so, we calculate the difference between the mean ideology of the accounts users are exposed to as a result of retweets and the mean ideology of the accounts that users directly follow, weighted by the frequency of tweets received from each account. Results are presented in Figure 13. As the figure demonstrates, the Note. This figure presents the ideological distributions of the accounts that respondents directly follow and the accounts that respondents see as a result of retweets (weighted by the frequency of tweets received from each account). mean ideology of the authors of retweets is more conservative for users who are liberal and more liberal for users who are conservative. In other words, retweets that users receive generally originate from accounts that are more moderate than the accounts they follow directly. Thus, while retweets do not greatly increase the ideological diversity of accounts seen by individual users, they do have the effect of presenting conservatives with more liberal tweets than those they would otherwise see, and of presenting liberals with more conservative tweets than they would otherwise see.

Conclusion
There is legitimate concern that sorting into mutually exclusive informational cocoons could lead to wildly divergent factual beliefs among the mass public, fueling polarization, stymieing awareness of alternative viewpoints, and providing no basis for political compromise. By examining behaviorally valid data reflecting the online media diets of a sample of Americans, we have taken a step toward quantifying the empirical basis for this concern. Our results provide a nuanced portrait of the information environments of Americans on Twitter. Most critically, we do not find evidence supporting a strong characterization of "echo chambers" in which the majority of people's sources of news are mutually exclusive and from opposite poles (e.g., Sunstein, 2017): There is generally more overlap than divergence in the ideological distributions of media accounts followed by the most liberal and most conservative quintiles in our sample. However, we also show that fully 61% of members of the most conservative quintile in our sample follow very few media accounts even as far "left" as the New York Times, suggesting their online media diet is quite ideologically constrained. We note, however, that of those 61% of conservative respondents who rarely venture as far as the New York Times on Twitter, 45% of them did report watching TV news from either a mainstream or left-leaning network. Thus examining only online behavior and ignoring offline behavior can leave us with a misleading view of the amount of exposure to mainstream news that still goes on.
The pattern of relative overlap across ideological quintiles also masks asymmetries in individual liberals' and conservatives' tendencies to venture beyond fixed ideological positions in their following behavior. For instance, while only 16% of respondents in the left-most ideological quintile and 15% in the second-left-most quintile have social media diets with at least 5% of the media accounts they follow as far to the right as Fox News, in the right-most quintiles, the shares of respondents with at least 5% of followed accounts at or to the left of MSNBC are 22% and 30%. Given the fact that the ideological positions of both cable networks are comparable in absolute value according to some measures, this is a notable imbalance in following activity. And these numbers are also stark in what they reveal about social media exposure among the public: 85% of liberals are not being exposed to an ideological view (Fox News) that over two thirds of the most conservative quintile are being exposed to. And 78% of conservatives are not being exposed to an ideological viewpoint (MSNBC) that 78% of liberals are being exposed to. Thus while there is considerable overlap that both groups see, there are also areas of the media ecosystem that are primarily viewed by members of one ideological group. One possible interpretation of these findings is that the significant overlap in patterns of exposure to media content on Twitter ensures that a set of people across the ideological spectrum are receiving similar news and information about politics. Critically, however, large portions of the most liberal and most conservative users never see what the other Figure 13. Difference in means between tweets received from accounts followed and accounts retweeted.
Note. This figure shows the difference in means of the estimated ideology of the accounts followed by respondents and the accounts seen as a result of retweets (weighted by the number of tweets received from each account).
side is saying-how they are reacting to that news and information, and what they find to be most important.
While fully explaining observed asymmetries in following behavior is beyond the scope of this article, we suggest that the differences could lie in both supply-side and demandside factors. Partisan outlets in the conservative media ecosystem are simply not a mirror image of the mainstream news outlets that they often position themselves against. Opinion plays an outsize role in this ecosystem, while it is only a part of what mainstream news organizations produce. By the same token, professional journalistic norms are not as widely established in partisan conservative media, so that conservatives with a taste for both traditionally reported news and congenial opinion may need to sample more widely than liberals with analogous tastes (Grossmann & Hopkins, 2016). On the demand side, some have argued that conservatives rely on a wider set of moral foundations than liberals, leading to greater "moral pluralism" and more willingness to engage with different belief systems (Graham, Haidt, & Nosek, 2009). We suspect that the answer may also be related to the other pattern revealed in our data-that the most conservative respondents in our sample tended to have one of two distinct media diets, with ideological distributions either centered around more mainstream news outlets or skewed exclusively toward strongly right-wing sources. Our research design also allowed us to explore distinctions between accounts followed, on one hand, and all tweets potentially seen, on the other. When analyzing social media diets using all "received" tweets, we find similar levels of overlap between users at the opposite ends of the political spectrum than when looking only at accounts followed. Also, when looking at received tweets instead of accounts, larger shares of the most liberal and conservative respondents are potentially exposed to counter-attitudinal content. We additionally find that retweets lead to somewhat more ideologically heterogeneous information environments than looking at directly authored tweets alone. The magnitude of this difference is statistically significant but not especially large, with the exception of tweets from politicians. This suggests the limits of weak ties as conduits for incidental exposure to diverse content on Twitter. By contrast, we note that the relatively moderating effect of incidental exposure to retweetsresulting in more conservative information environments for liberal users and more liberal ones for conservative users on average-appears robust and substantial.
This study illustrates the inferential advantages of linking surveys with digital trace data. We combine the strengths of traditional survey research-a well-defined sample with rich individual-level covariates-with large-scale social media analysis, which enables us to reverse engineer the online information environment of our respondents using publicly available behavioral data.  Guess, Lyons, Nyhan, & Reifler, 2018;Nyhan, 2014Nyhan, , 2016. 2. The sample is from a nationally representative survey conducted by YouGov that oversampled people on social media and included a set of respondents who had previously consented to provide YouGov with their Twitter handles. The panel survey included three waves, beginning with N = 3,500 respondents in March 2016 with the last wave occurring in October 2016. See Guess, Munger, Nagler, and Tucker (forthcoming) for additional details on survey data and collection. 3. We refer to tweets that someone may have seen as tweets that the user "received" or was "exposed to." To be precise, we are capturing potential exposure; we do not know how often a respondent checked his or her Twitter feed or what fraction of tweets they read. 4. There were 2,180 media accounts and 649 politicians' accounts. 5. They might also, of course, be exposed to political news via their non-political friends, which we address further below. 6. If accounts on the right tweet more than accounts on the left, users might see more from the right even if they were to follow accounts on the left and right equally. As we show later, however, weighting the ideological distributions by tweet frequency has no meaningful effect on the shape of the distribution. 7. We choose MSNBC and Fox News as two fixed points of comparison in the text because the estimated media slant of both are relatively equidistant from the midpoint of the ideological spectrum. For example, "alignment" scores generated by a supervised learning method trained on Facebook sharing data suggest positions (on a scale of -1 to 1) of −0.81 and 0.78 for the associated web domains of the two cable networks, respectively. (See Bakshy, Messing, and Adamic, 2015, for details.) 8. It is worth noting that the vast majority of tweets sent by this group is likely not about politics (Guess, Munger, et al., forthcoming).
9. Or, more precisely, on Twitter. 10. Unfortunately, unlike our Twitter measures, our measures of TV news exposure are subject to well-known biases and distortions generated from survey-based self-reports (Guess, 2015) and should be interpreted accordingly. 11. Whereas 12% of all liberal respondents watched Fox News on TV.
12. The overlap, δ , between two densities f and g is defined as δ( , ) = { ( ), ( )} f g f y g y dy ∫ min . To calculate this quantity for each pair of densities of interest, we use the R package overlap (Meredith & Ridout, 2017). 13. The overlap coefficient is bounded by 0, indicating no overlap between users, and 1, indicating complete overlap. 14. We use random sampling due to the computational burden of calculating the overlap coefficient for all N N 2 2 − user-pairs (in our data, over 1 million user-pairs). 15. There is necessarily some conflation of direct and indirect exposure as defined by authored tweets and retweets: tweets that are authored by a user may also include links that contain ideologically diverse information. Our interest, however, is in the indirect exposure of users to others whom a user may not explicitly follow, as analogous to the case laid out by Granovetter (1973).