Who Voted for Brexit? Individual and Regional Data Combined

Previous analyses of the 2016 Brexit referendum used region-level data or small samples based on polling data. The former might be subject to ecological fallacy and the latter might suffer from small-sample bias. We use individual-level data on thousands of respondents in Understanding Society, the UK’s largest household survey, which includes the EU referendum question. We find that voting Leave is associated with older age, white ethnicity, low educational attainment, infrequent use of smartphones and the internet, receiving benefits, adverse health and low life satisfaction. These results coincide with corresponding patterns at the aggregate level of voting areas. We therefore do not find evidence of ecological fallacy. In addition, we show that prediction accuracy is geographically heterogeneous across UK regions, with strongly pro-Leave and strongly pro-Remain areas easier to predict. We also show that among individuals with similar socioeconomic characteristics, Labour supporters are more likely to support Remain while Conservative supporters are more likely to support Leave.


Introduction
Populism has been on the rise across Europe and the United States in recent years, culminating in the election of Donald Trump as US President and the Brexit vote in the 2016 EU referendum. The Brexit vote came as a shock to many observers and triggered early attempts to understand the voting patterns. 1 These studies relied almost exclusively on aggregate data at the level of voting areas. Regressing vote shares across voting areas on average population characteristics risks falling into the ecological fallacy trap of inferring individual associations from aggregate data (see Robinson, 1950). We use detailed individual-level data from the Understanding Society survey containing the EU referendum question to address three interrelated questions.
First, we investigate the relationship between voters' personal characteristics and their expressed voting intentions. Particularly, we address whether ecological fallacy may be driving the associations documented in the aggregated data.
Second, building a predictive model of Leave support we assess which voting determinants have the most power to predict voting behavior out of sample.
Third, we investigate the classification errors that this predictive model makes by region and voters' closeness to political parties.
We find that individual and aggregate coefficients point in a similar direction, suggesting that ecological fallacy is of limited concern. Second, we document that the predictive models exhibit a significant gain in accuracy when exploiting both individual and regional variables. Lastly, we document that a predictive model performs best in parts of the UK with the most extreme referendum outcomes: Lincolnshire (highest Leave share) and London (lowest Leave share across mainland Britain). Furthermore, a decomposition of classification errors reveals that closeness to a political party is likely an important omitted variable, suggesting that unobservable traits and identity are further key correlates.
The paper is structured as follows. Section 2 lays out the literature background, describes the data and explains our empirical approach. We present graphical summaries of our results in Section 3, and we conclude in Section 4.
Underlying regression results and further details are relegated to an appendix.
2 Background, data and empirical approach

Background
This paper builds on Becker et al. (2017) who analyze the Brexit vote shares across UK voting areas, using a wide range of explanatory variables. They show that the Leave vote shares are systematically correlated with older age, lower educational attainment, unemployment, or employment in certain industries such as manufacturing, as well as with a lack of quality of public service provision.
These results fit in with other evidence on the Brexit vote. An early attempt to explain the referendum outcome was made by Ashcroft (2016) whose polling data indicated that the typical Leave voter is white, middle class and lives in the South of England. Sampson (2017) reviews the literature on the likely economic consequences of Brexit on the British economy and other countries.
Our paper also relates to the wider literature on political polarization as well on voting for far-right parties. Ferree et al. (2014) provide an extensive review of academic works which link voting patterns to demographic, economic and political features. Voters' behaviour has also been shown to be strongly associated with individual scepticism towards institutions (e.g. Euroscepticism) or intolerance against foreigners (see Whitaker and Lynch, 2011;Clarke et al., 2016 andArzheimer, 2009). Additional studies claim that ethnic minorities may engage in 'ethnic' or 'policy' voting depending on the issue they are called to vote upon (see Bratton andKimenyi, 2008 andTolbert andHero, 1996).
Polarization has also been related to immigration (see Barone et al., 2016) as well as trade integration (Dippel et al., 2015;Burgoon, 2012 andAutor et al., 2016). In the UK context, Becker and Fetzer (2016) examine immigration from Eastern Europe as a potential driver of support for the UK Independence Party, while Fetzer (2018) explores the role of austerity policies since 2010.
Overall, the voting patterns in the Brexit referendum are complex. One pos-3 sible -albeit not the only -interpretation of the empirical literature on Brexit so far is that some people who favor Leave may feel 'left behind', be it economically or culturally (see Hobolt, 2016 andClarke et al., 2017). This is consistent with sociological studies which demonstrate similar patterns for the Tea Party Movement and the 2016 US presidential election, e.g. Hochschild (2016).

Data
Are these aggregate patterns found by Becker et al. (2017) and others a fair reflection of individual-level relationships? The individual-level data from wave 8 of the Understanding Society survey makes it possible to investigate this question. Our focus is on individual socio-economic variables for which regionlevel equivalents are used in Becker et al. (2017). Our approach of combining individual-level and aggregate data allows us (a) to check whether ecological fallacy is an important factor in aggregate analyses of the Brexit vote, and (b) to exploit the combined predictive power of individual-level and aggregate variables. This opens up insights into (c) geographic heterogeneity in predictive power across UK regions.
The Understanding Society data cover a wide range of topics, in particular basic demographic data for all household members such as sex, age and ethnicity, place of birth, family background including marital status, educational attainment, current job characteristics, housing characteristics (owning vs. renting), health status and life satisfaction. We describe the sampling design in more detail in an appendix, and how we construct our sample (also see Knies, 2016).

Descriptive statistics
According to the summary statistics in Table 1 As for demographics, the proportion of males is 45.4% of all individu-als in the sample, while just about three out of ten respondents are aged 60 or above. People with no qualification account for about 8% of the sample.
Roughly 90% of respondents are born in the UK. Asians are the largest ethnic minority amounting to 5.8% of the sample, followed by blacks (2.5%). 2 Over half of respondents are married or in a civil partnership. In terms of current employment, roughly four out of ten people declare to be without a paid job or to not have worked in the seven days prior to being questioned. 3

Understanding Society: Research in progress
We gained access to Understanding Society data in the summer of 2017, at the same time as other groups of researchers in a pilot 'early access' project. We briefly summarize related preliminary findings reported by other researchers in short presentations in the summer of 2017. For instance, Creighton and Amaney (2017) find that opposition to immigration played a key role. Martin and Sobolewska (2017) explore racial determinants and find that ethnic minorities are strongly in favor of remaining in the EU. De Vries and Solaz (2017) attempt to explain voters' behavior by analyzing socio-economic determinants such as asset holdings, sources of income and skills, whereas Doebler et al.
(2017) explore additional potential drivers such as personal economic struggle and regional economic decline.
As far as we are aware, only one other paper using Understanding Society data has come out as a working paper so far. Liberini et al. (2017) show that individuals dissatisfied with their own financial situation were more likely to vote Leave and that the very young were most likely to vote Remain. In related work, Pollock (2017) uses the Innovation Panel to argue that the rise in populism and the vote in favor of Brexit can be attributed to generational shifts away from mainstream political parties over the past three decades.
2 Note that we sourced nationality and ethnicity variables also from earlier waves. 3 The aggregate variables in Table 1 are not standardized for descriptive purposes, but they are in all regressions.

Empirical approach
We start with a simple model where the dependent variable y ic is a dummy for individual i in local authority c which takes on the value 1 if the interviewed person answers "Leave" in response to the question "Should the UK remain a member of the EU or leave the EU?" and 0 if the answer is "Remain": (1) The independent variables in the model are the Understanding Society crosssectional individual covariates x ic on the one hand, and area-specific aggregate variables z c from Becker et al. (2017) on the other. Our overall sample contains 13,136 respondents for our baseline regressions. We also analyze smaller samples and subgroups of variables since not all Understanding Society respondents were asked each survey module. As the summary statistics in Table 1 show, roughly 42% of respondents are in favor of Leave.
We relegate the details of the underlying regression results to the appendix.
For ease of interpretation, throughout the regression tables in the appendix we provide coefficients obtained from a simple linear probability model estimation of equation (1). However, each model is also estimated using the corresponding logistic regression model to provide an estimate of the success rate at the bottom of each table.
Since our interest centers on prediction, we need a metric to assess predictive accuracy of our regression models. We perform a simple validation exercise known from the machine learning literature. Our sample is divided into a random training set (2/3 of the sample) and a validation set. Logistic regressions are conducted on the training set, and we use the validation set to perform classification. We follow Bayes' optimal decision rule and classify an observation as "Leave" if the predicted posterior probability exceeds 50%. In essence, this rule simple allocates the label ("Leave" or "Remain") to an observation that, condi-

Predicting the vote
In order to focus on prediction quality, we relegate the discussion of individual regression tables to the appendix. First, we focus on the relative predictive power of individual-level and aggregate variables. Second, we examine the predictive power of our best-performing model across regions and lastly, we investigate the classification error structure. The improvement in accuracy up to 63.4% with all variables included suggests an improvement in prediction accuracy relative to the naive benchmark of 9.7%.

Individual vs. aggregate variables
Furthermore, an inspection of the tables in the appendix confirms that the individual-level predictors yield broadly similar sign patterns to their aggregate level equivalents. This suggests that ecological fallacy is not a major concern for the results in Becker et al. (2017).
The combination of individual and aggregate characteristics yields a further slight improvement in prediction accuracy. Relative to the naive classification rule, accuracy can improve up to 64.6% with all covariates included, representing an improvement of 11.7% in relative terms. Adding further individuallevel characteristics that are included in the Understanding Society sample (but for which no aggregate proxy measures exist) suggests that overall accuracy is not further improved.
In fact, our best model including all characteristics sees a small drop in the success rate. In terms of the bias-variance trade-off inherent in such predictive models, the improvement in terms of bias are therefore likely offset by an inflation in terms of variance, resulting in worse out-of-sample performance.
We refer to James, Witten, Hastie and Tibshirani (2013) for a discussion of the bias-variance trade-off.
As explained in the appendix, we explore a number of novel individual determinants. We find that marital status, technology use and dependence on income support and state benefits are all systematically linked to individual voting behavior. In particular, individuals who do not possess smartphones and who use the internet infrequently appear more inclined to support Leave. Those repeatedly seeking health care or receiving income support also tend to be more in favor of Brexit. Similarly, it is also fair to say that Brexit is a predominantly white phenomenon compared to ethnic minorities.

Geographical heterogeneity
An instructive step lies in attempting to decompose in which regions our model does a good job in correctly classifying the voting intentions in the Understanding Society sample. Among all NUTS2 regions in Figure 2, Inner London displays the lowest error rate (21%) followed by Lincolnshire and North Eastern Scotland (with 23% and 26%, respectively). Lincolnshire and Inner London had among the highest and lowest Leave vote shares in the referendum. Thus, it is hardly surprising that the empirical model performs well in separating voters in these regions.
The model has the lowest performance in Tees Valley and Durham, East Anglia, and Merseyside (with error rates around 43-44%). Generally, the picture that emerges suggests that purely based on the socio-economic characteristics, areas that are more disadvantaged are the ones where it is most difficult to separate Leave from Remain voters. Non-economic factors may therefore be particularly helpful in capturing variation between voters in these areas.

Types of errors
We turn to decomposing errors into false positives and false negatives. The results presented in Figure

Conclusion
Individual-level regressors from the British Understanding Society survey containing the 2016 EU referendum question give similar results to corresponding aggregate variables at the level of local authority areas analyzed by Becker et al. (2017). We therefore find no evidence of ecological fallacy effects -individuals appear to behave in similar ways as suggested by the aggregate data.
We also shed light on the predictive power of different determinants of the Leave vote. Demographics and employment characteristics are the most relevant covariates for prediction, while the cumulative power of individuallevel and aggregate variables shows a non-negligible gain over aggregate data alone. Geographical heterogeneity is also important as our model performs best in more prosperous areas (London in particular).
Finally, we also find that individuals who support the Labour party but have otherwise observables that would put them in the Leave camp are significantly more likely to vote Remain. Vice versa, supporters of the Conservative party with Remain-favouring characteristics are more likely to vote Leave.
De Vries, C. and H. Solaz (2017). Brexit and the Left Behind? Disentangling the Effects of As-

A Data and regression results
In this appendix we present our data and empirical regression results in more detail.

A.1 Sampling design
Concerning the design and data collection of Understanding Society, the general population sample is a stratified, clustered, equal probability sample of residential addresses drawn to a uniform design throughout the whole of the UK. For each wave, the data collection is spread over a two-year period, and the overall sample is divided into 24 monthly subsamples, each independently representative of the UK population. Computer assisted personal interviewing (CAPI) was mainly used to collect the data. 1

A.2 Constructing the sample
The construction of our sample takes place in various steps. Initially, the raw individual survey (wave 8) consists of 21,076 observations. Then, matching the household survey leaves 20,821 individuals. Further matching with local authority codes results in a sample of 17,697 respondents (i.e. over 3000 surveyed individuals get lost because there is no location code associated with their households). Finally, we merge this last sample with the aggregate information used in Becker et al. (2017). In this last step, the number of surveyed individuals is 15,844 across 377 local authorities. 1 These details are taken from Understanding Society: Design Overview by Buck and McFall (2012). For further details refer to the Understanding Society User Guide (wave 1-6) by Knies (2016).

A.3 Regression results
We divide our variables into groupings as follows. The first group of explanatory variables includes basic demographic features such as sex, age, marital status, education and employment. The second group explores data on individuals' use of health services. The third group captures information on housing (ownership vs. renting) drawn from the household questionnaire. The fourth group refers to employment. This is followed by a focus on unearned income and state benefits. The sixth group consists of life satisfaction indicators. The seventh and final group covers nationality and ethnicity.
The results are reported in Tables A.1a to A.10. We present linear probability models as the default, with the exception of logit models in Table A.1b, probit models in Table A.1c and weighted OLS models in Table A.1d and A.1e. 3 2 In unreported tables (available upon request), we compare the 14,476 individuals individuals who answer the Brexit question to the 1368 non-respondents for each group of covariates (i.e. all regressors in Tables A.1a to A.10 in the appendix) and establish along which dimension the two groups are statistically different. If anything, non-respondents seem to display most of the characteristics of a typical Leave voter. More specifically, non-respondents are significantly older, less used to technology, with lower educational attainments and more frequently unemployed. In addition, they seek more medical attention, their housing status is more often local authority renting, and more of them receive income support. Finally, non-respondents are less often UK natives and more often members of an ethnic minority. 3 We would like to note that sampling weights in Understanding Society, which we use inTable A.1d are quite homogenous. In our main estimation sample, the median sampling weight is 0.956, the 25th percentile is 0.770 and the 75th percentile is 1.237. This explains why weighted and unweighted regression results are so similar. [USOC wave 8 data has a substantive number of observations with missing weights. This is due to the fact that it is a pre-release version. The final version of wave 8 is expected to be released towards the end of 2018 or in 2 When variables are perfectly comparable at individual and aggregate levels, the first three columns of the tables directly compare those to address the potential ecological fallacy concern.

A.4 Demographics, technology, education and employment
In Tables A.1a  of the elderly population is lower in magnitude, it presents a predictive power very similar to the individual counterpart. Column 4 indicates that males are 4.7% more likely to vote Leave. Compared to middle-aged respondents, the tendency to support Leave is substantially lower by 12.3% for younger cohorts up to the age of 30 and notably higher by 9.1% for individuals aged 60 or above.
Columns 5 and 7 confirm these results in terms of significance even when we control for the share of the population aged 60 or above at the local authority level. In column 6 we focus on technology use. Individuals who do not use a smartphone are substantially more likely to vote Leave. Using the internet every day is associated with a substantially lower probability to vote Leave. early 2019.] In Table A.1e, we mechanically re-weight the sample to align the share of 'Leave' voters with the actual referendum result.
3 These patterns persist even once we control for sex and age in column 7.
In Table A.2 we explore the predictive power of educational attainment.
Again, variables on education attainment relate to the referendum outcome in the same way and with matching power at both individual and aggregate levels although aggregate coefficients have lower magnitude and significance. Hence, highly qualified individuals with university and college degrees are considerably less likely to vote Leave by over 20% compared to people with average qualifications. In contrast, having no qualification is a very strong predictor of voting Leave. These results holds up once we control for aggregate characteristics on educational attainment in columns 3 and 5 as well as sex and age in column 6.
Next, in Table A.3 we analyze individuals' current employment and marital status. At the individual level, comparison groups are predominantly retired and divorced respondents, respectively. 4 Here, aggregate rates on employment are indistinguishable from zero (although they have the same predictive power as the individual variables, and self-employment and unemployment coefficients have the 'correct' sign). Column 1 of Table A.3 shows that self-employed and paid employees are more likely to support Remain (relative to mostly retired people). Column 4 shows that single and married people are significantly less likely to vote Leave (compared to divorcees, separated and widowed people). Again, most of these results hold up once we control for aggregate rates in column 3 as well as for age in column 5. Unemployment now also shows up as highly significant. 5 To sum up our results on demographic variables, we find that individuals are more likely to support Leave if they are male, older, use less technology, 4 Excluded categories among current activity feature Retired (64.7%), Looking after family or home (10%), Full-time student (14.3%), Long-term sick or disabled (7.5%), Doing something else (2.2%). Excluded categories among marital status feature Divorced (57.4%), Separated (10.3%), Widowed (31.6%), Other (0.7%). 5 To get a sense of whether changes in (un)employment status matter, in unreported regressions, we used additional information based on a short employment history (looking at respondents participating in both wave 7 and the pre-release version of wave 8 with the EU question). The results suggest that the preferences for Remain and Leave are quite static or do not respond in a remarkable fashion to individuals switching employment status (by becoming unemployed or employed between wave 7 and wave 8). Rather, the first-order differences in tendencies to support Leave or Remain for our prediction exercise are driven by individuals who are employed or unemployed in both survey waves, implying that looking at only the cross-section is sufficient to capture the role of employment variables. are less qualified, retired or unemployed, and divorced, separated or widowed.
These findings are consistent with the results by Becker et al. (2017) based on aggregate data who also find that age, low educational attainment and unemployment are key explanatory variables to predict the Leave vote shares across UK voting areas. Table A.4 analyzes the relationship between Brexit support and individuals' use of health services. Interestingly, columns 1 and 2 show that individuals who visit their general practitioner (GP) very frequently (over ten times in the previous 12 months) are more likely to support Leave. Those are arguably individuals of poor health or older generations. Conversely, those who did not visited the GP even once have a slightly higher probability to support Remain.

A.5 Health
Controlling for age in column 2 turns the latter result insignificant (possibly because it is young people who do not go to the doctor) but preserves the former result on frequent GP visits.
A similar picture emerges from columns 3 and 4, focusing on individuals who are never or extremely often classified as out-patients. The same holds for people admitted as in-patients at least once during the preceding 12 months.
That is, people of poor health as proxied by frequent visits to the GP or hospital are substantially more likely to support Leave. Perhaps it is therefore no coincidence that a key pledge of the pro-Brexit referendum campaign was to invest more in the National Health Service (NHS). When directly comparing individual tenure status to corresponding aggregate shares we see similar paths (columns 1 to 3), in particular with respect to direct ownership which is positively related to Leave support.

A.6 Housing
In terms of individual housing tenure, owning their own property tends to make individuals more likely to support Leave, although this particular association is barely statistically significant. The omitted category here is renting through a housing association. More importantly, higher property values are significantly related to an increased likelihood of supporting Remain. A onestandard deviation increase in property values increases the Remain likelihood by roughly 4%. Property values are arguably positively linked to individuals' financial status, which would be consistent with earlier evidence on income based on aggregate data (see Becker et al., 2017).

A.7 Employment
This section shifts the focus towards employment-related determinants. For starters, Table A.6 indicates a higher probability of almost 10% to support Leave for individuals who did not work in the week prior to the questionnaire and who did not have a paid job compared to those respondents who were either working or had a paid job (stable across all specifications).
In Table A.7 we narrow our analysis to only those participants who worked or had a paid job. This reduces the number of observations to 8,434. First, columns 1 to 3 compare the individual sector of employment to the respective aggregate controls (manufacturing, construction, retail and finance as used in Becker et al., 2017). Estimates as well as their predictive power are aligned (although aggregate coefficients are lower in magnitude). Indeed, both specifications suggest that workers in the manufacturing, construction and retail industries are significantly more likely to support Leave. Note that individual estimates are fairly stable across all specifications.
In addition, it emerges from column 4 that those with a permanent job compared to those in non-permanent employment have a higher probability of supporting Leave. This result continues to hold qualitatively in column 5 after we control for individuals' age, sex and education as well as the sectoral distribution and growth of employment at the aggregate level in column 6. This result appears surprising, but we note that the subsample in Table A.7 is highly unbalanced in the sense that 90% of the respondents have a permanent job. Still, 60% of individuals with permanent jobs support Remain versus 70% of those 6 with temporary jobs. It also appears likely that the very young respondents, who are overwhelmingly in favor of Remain, are less likely to hold permanent jobs. Our age dummies in column 5 might not pick up these age patterns appropriately. Finally, self-employed respondents are also more likely to support Leave, even though this association is insignificant for most specifications in the table.
Overall, consistent with the aggregate results in Becker et al. (2017) our findings support the view that individuals are more willing to vote for Brexit if they work in sectors such as manufacturing that have arguably been hit relatively hard by trade openness and international competition (also see Colantone and Stanig, 2017). In addition, workers in manufacturing, construction and retail sectors have lower educational attainment on average while the opposite is true for workers in financial sector.

A.8 Unearned income and state benefits
In Table A.8 we highlight the role of unearned income and state benefits. In column 1 we find that respondents who receive core benefits have significantly raised probability of supporting Leave compared to those receiving none. These core benefits are broken down into their various components in column 2. In particular, recipients of income support are substantially more likely to be in favor of Leave (by 20%), whereas job seeker's allowance, child benefit and universal credit do not matter.
Similar results hold for people receiving pensions. This particular finding is likely driven by the overwhelming share of older people amongst pension receivers (see section A.4). The same pattern holds for people on disability benefits, in line with our estimates on health service usage (see section A.5).
Finally, the opposite is true for respondents who receive other sources of income. Those are broken down in column 3. The key income streams are education grants and student loans as well as payments from family members living elsewhere. This suggests a tight link previously with age and education (see section A.4).
In summary, the forms of income and benefits in Table A.8 are likely cor-related with more fundamental characteristics such as age and health, as discussed in previous tables.

A.9 Life satisfaction
In Table A.9 we explore the potential link between Brexit support and indices of health, income and life satisfaction. When looking at overall life satisfaction only (columns 1 to 3), the individual coefficients suggest that dissatisfied people are significantly more likely to favor Leave while the aggregate estimate implies that a higher relative dispersion of well-being across voting areas, which can be interpreted as a measure of life satisfaction inequality, has positive predictive power for the Leave support. Success rates of prediction are very similar whichever level of variation is considered.
In addition, people dissatisfied with health and income have a higher probability of supporting Leave by 5.5% and 6.4%, respectively. Once again, we can relate these findings to those in Table A.4 on health and Table A.8 on income and benefits. Interestingly, people dissatisfied with their amount of leisure time are significantly more likely to support Remain by 6.3%. This may be linked to the fact that these respondents have on average higher levels of educational attainment and they are generally younger. Note that when these individual variables are considered (columns 4 and 5) the individual estimate of overall life satisfaction is absorbed and becomes insignificant.            (OLS). Non-dummy variables are standardized. Authority-level clustered standard errors are presented in parentheses, asterisks indicate *** p<0.01, ** p<0.05, * p<0.1.