Using social media audience data to analyse the drivers of low-carbon diets

Low-carbon lifestyles are key to climate change mitigation, biodiversity conservation, and keeping the Earth in a safe operating space. Understanding the global feasibility and drivers of low-carbon lifestyles requires large scale data covering various countries, demographic and socioeconomic groups. In this study, we use the audience segmentation data from Facebook’s advertising platform to analyse the extent and drivers of interest in sustainable lifestyles, plant-based diets in particular, at a global level. We show that formal education level is the most important factor affecting vegetarianism interest, and it creates a sharper difference in low-income countries. Gender is a strong distinguishing factor, followed by national gross domestic product per capita and age. These findings enable upscaling local empirical studies to a global level with confidence for integrated assessments of low-carbon lifestyles. Future studies can expand this analysis of social media audience data to other consumption areas, such as household energy demand, and can also contribute to quantifying the psychosocial drivers of


Introduction
Low-carbon lifestyles, comprised of sustainable choices in various consumption areas from food to energy, are considered a key mitigation option to tackle climate change [1,2]. Besides lowering the resource demand and greenhouse gas emissions [3], lifestyle change has a strong potential to limit environmental pressure [4,5], to create co-benefits for multiple sustainable development goals (SDGs) regarding public health, poverty and biodiversity [3,6,7], and to reduce the intensity of SDG tradeoffs [8].
Achieving the full potential of lifestyle change requires widespread societal transformation. The feasibility of this transformation and how it can be facilitated is yet unknown, because lifestyle change is a complex phenomenon driven by various social, economic, cultural and psychological factors. Quantitative scenario analyses that explore the contribution of lifestyle change to climate mitigation and sustainable development urgently need to address this complexity. However, the lack of large scale data about the societal heterogeneity of pro-environmental consumption behaviour hinders such quantitative integrated analyses on the feasible potential of lifestyle change.
Theoretical and empirical studies provide a growing understanding of pro-environmental behaviour [9][10][11][12][13][14], hence shed light on the bottom-up feasibility of lifestyle change. However, such empirical studies are limited temporally, geographically and contextually [10,12]. In other words, they are based on case studies and surveys that are conducted in a limited number of countries at a particular time, for a particular lifestyle domain and from a particular disciplinary perspective such as behavioural economics or environmental psychology [15,16]. Therefore, such empirical studies may not allow generalization and large-scale experimentation to understand the feasible mitigation potential and key drivers of lifestyle change especially in interdisciplinary studies such as integrated assessment modelling [17]. Furthermore, most empirical studies are bounded by self-reported data, which may be biased by response styles such as socially desirable or acquiescent responding [18,19], hence often differ from actual actions and consumption behaviour that can be better measured by observed data [12].
Big data sources, i.e. individuals' and households' online data footprint, can address these limitations of conventional data sources by helping to understand personal carbon footprints, lifestyle change tendencies and their drivers [20]. Online social media (OSM) data, as a publicly available big data source, is particularly promising since it can shed light on socioeconomic, demographic, cultural and even psychological drivers of consumption and lifestyle change. OSM data has been used to analyse several social phenomena such as epidemics [21,22]; obesity prevalence [23], food choices [24], human migration [25][26][27], disaster damage and risk perception [28], and gender inequality [29,30]. However, the use of OSM data for estimating the demand and understanding societal heterogeneity in key consumption sectors behind environmental degradation has been limited to a few studies in the transport sector [31,32].
OSM data provides large-scale information harmonized across different countries that cannot be obtained from surveys with inevitably limited sample size. OSM data might reflect 'observed' data as opposed to self-reported, since it is based on users' posts, activities, purchases and other online behaviour. Still, social media data has several limitations. It is biased towards the users of these online platforms and may not represent the entire population. Accessing the commercial platforms for data collection may not be straightforward. The publicly available data is aggregated from individual data based on black-box algorithms, it may not fully and transparently represent the actual consumption behaviour, also because online and offline behaviour can still differ. Therefore, OSM data is a promising source to investigate and quantify global societal trends and heterogeneity behind lifestyle change for demand-side climate change mitigation, but its usability in this context should be investigated due to potential limitations.
The objective of this paper is to explore the usability of OSM data to analyse the drivers of low-carbon lifestyles and identify the relative impact of demographic factors such as age, gender and education level on population-wide lifestyle change interest. We particularly demonstrate how Facebook audience segmentation data published for advertising purposes can be used to quantify the societal heterogeneity of global interest in low-carbon diets. For this purpose, we created a dataset of daily and monthly active users (DAU and MAU) marked by pre-defined interests in sustainable lifestyles, particularly vegetarianism. We retrieved publicly available and anonymous data from Facebook marketing application programming interface (API) as described in the section 2. We collected the audience size data at multiple points between September 2019 and June 2020, for each interest category, age, gender, education level and country. This dataset covers 131 countries and around 1.9 billion people as the total Facebook audience size in those countries, 210 million interested in vegetarianism, and 33 million interested in sustainable living (supplementary table 1 (available online at stacks.iop.org/ ERL/16/074001/mmedia)).

Methods and data
To explore the usability of OSM data, we first collected the audience segmentation data from Facebook marketing API. The lack of large-scale and reliable survey data on the interest in low-carbon lifestyles impedes a precise validation. Still, as an initial validation step, we compare the Facebook audience size data to the limited empirical data from scientific and marketresearch surveys, to Google Trends data as another indicator of online interest, and to food consumption trends based on UN Food and Agriculture Organization's (FAO) statistics. We then analyse the relationship between the Facebook audience's interest in low-carbon diets, GDP per capita and mean years of schooling (MYS) at the country level using multiple linear regression. Lastly, we identify the key drivers of interest in low-carbon diets based on the granular Facebook data using machine learning (ML) techniques. We describe these data and methods below.

Data collection 2.2. Facebook
We collected the Facebook audience size data using a Python interface called pySocialWatcher [33] to the Facebook Marketing API [34]. The audience size data is freely available to any registered advertiser on Facebook, and Facebook Marketing API includes only aggregated and publicly available data. Therefore, we had no access to and we did not use any personal information in this study.
The Marketing API allows targeting specific population groups with queries on demographic factors such as age, gender, location, education level, and interest categories that refer to social, economic, and cultural interests like soccer, yoga or agriculture. While the demographic factors are mostly user-defined, interests are inferred by Facebook algorithms according to what people share on their timelines, apps they use, ads they click, pages they like and other activities related to things like their device usage and travel preferences [35].
In this study, we chose two interest categories relevant for low-carbon lifestyles, vegetarianism and sustainable living. We determined these interest IDs based on a keyword search on the Marketing API for available interest categories. For instance, a query with the keyword 'vegetarian' returns the interest categories vegetarianism, vegetarian cuisine, lacto vegetarianism etc. We chose vegetarianism and sustainable living since they are the ones with the highest global audience size among the interest categories returned for respective searches. Supplementary table 1 shows the interest IDs and global audience sizes returned by a keyword search. In addition to the audience sizes of specific interests, we collected the total audience size data for each demographic group (age, gender, education, country) without any interest constraint so that the fractional interest of this demographic group in a subject could be calculated.
Our choice of the keyword vegetarianism was motivated by the breadth of the term compared to plant-based diets or sustainable diets, and its availability as a pre-defined interest on the Facebook advertising platform. Interest in vegetarianism can be motivated by different reasons, such as animal welfare, health and religion. Therefore, vegetarianism interest analysed in this study is an indicator of the spread of meat-free diets, more relevant for estimating food demand, but not an indicator of vegetarianism interest only for pro-environmental reasons.
The Marketing API returns two metrics for audience size: DAU and MAU. MAU can be a better estimate of the target group since not everyone uses Facebook every day. However, the Marketing API returns rounded numbers for MAU, for instance 1000 by default for very small audience sizes that have zero DAU. This potentially leads to an overestimate of the actual audience size. Therefore, we use DAU as the metric for audience size throughout this study.
The audience size returned by the Marketing API reflects the present use of Facebook and does not include a temporal dimension. To account for the changes over time either in the actual user interests or in the definition of these interests in the Facebook algorithms, we collected the audience size data in September 2019, January 2020 and June 2020. While the January dataset includes only the total audience sizes in each country, September and June datasets are disaggregated for age, gender and education. Supplementary figure 1 shows that both absolute and fractional audience size for vegetarianism increased from September to June in almost all 50 countries with the highest fractions of vegetarianism interest. Table 1 summarizes the dimensions and size of each audience size dataset. For instance, the June data contains the audience size for two interest groups, for 11 age cohorts, 2 genders, 6 education levels, and 132 countries. This corresponds to 17292 data points for each interest.
Facebook data is biased towards the internet users, for instance young, urban, educated demographic groups, hence may not represent the entire population. In the social science studies where individual participant recruitment was done through Facebook questionnaires, this bias was found to be non-significant [36]. In recent studies that use the audience segmentation data, though, Facebook audience size is often corrected with the penetration rate, i.e. the fraction of actual population who is active on Facebook [37,38]. We do not use a correction factor in this study since we do not aim at prediction and the metric we use is not the audience size but fraction with a specific interest.
However, to avoid overconfidence in Facebook data as a representative of offline behaviour, we exclude the countries where Facebook penetration is low. Figure 1 illustrates the distribution of total Facebook audience size across 131 countries, namely the daily active users (DAU j ) and the penetration rate (p j ) calculated as in equation (1). For the penetration rate, we take the population aged 15 and more (Pop j ) in equation (1) to correspond to the reported age cohorts of the Facebook audience. We exclude the countries where the penetration rate is below the 25th percentile (0.24) and also the total audience size is below the 25th percentile (1.6 million). With this choice, we leave out the countries where Facebook audience represents less than 24% of the population, yet we keep the ones where the audience size is still considerable (above 1.6 million) even though penetration is low. This choice leads to 16 countries being excluded (j * ), and 115 being included (j) in our analysis. Equation (2) denotes the subset of chosen countries, where η symbolizes the percentile function.
Thus, the Facebook audience fraction for each interest, e.g. vegetarianism, in each country (F i,j ) is the DAU with this interest in each country (DAU i,j ) divided by the total DAU in that country (DAU j ) as denoted in equation (3). Equation (4) shows the audience fraction at higher granularity for each demographic group.

Surveys
We compare the vegetarian interest of the Facebook audience to available empirical data about vegetarian population in 30 countries (figure 3(a)). We compiled  this survey data from online news, NGO or market research articles based on queries on Google search engine in English, following the references in the news articles with a snowball approach, and from scientific literature that cites these market research articles. When the original sources could not be reached, we repeated the search engine query in the local language of the corresponding country using online translation. As listed in supplementary table 2, the collection year and sources vary across countries, and often do not have a reliable citation in the online article. We did not leave out such sources but recorded the lack of reliable original sources. Therefore, while using this survey list as the best available knowledge for comparison purpose, we note that it is not accurate and fully reliable for some countries, due to the differences in data collection time, method, and the lack of citations to actual data sources.
In figure 3(a) we use the Facebook audience data from September 2019-the oldest dataset-since the empirical data is relatively old (supplementary table 2). We report Spearman correlation coefficient as the r-value and the two-sided p-value for a hypothesis test whose null hypothesis is that the slope is zero, based on a Wald Test with t-distribution, calculated using SciPy's linear regression (stats.linregress) [39].

Google Trends
We check the consistency of Facebook data on vegetarianism interest with another online activity indicator, the Google Trends data (figure 3(b)). Google Trends reports the interest in a topic specified by a keyword globally over time, or across all countries at the present time. The interest value aggregates Google search volume, and is reported as an index relative to the categories in the inquiry. We downloaded the Google Trends data in January 2020 for the topics Vegetarianism, Sustainable diet, Sustainable lifestyles, Sustainable living and Plant-based diet for 126 countries using the Python package pytrends [40]. Global interest in vegetarianism, sustainable diet and plantbased diets have substantially increased in the last 10 years, whereas the interest in sustainable living and sustainable lifestyle have declined (supplementary figure 3). Since Facebook data measures the audience size and Google Trends data measures relative search volumes, in figure 3(b), we compare not these two metrics but the ranking of 126 countries in terms of them.

Socioeconomic indicators
We analyse the correlation of Facebook data to the country level socioeconomic indicators such as gross domestic product (GDP) per capita, MYS, and average meat consumption per capita. GDP data is obtained from the World Bank statistics [41] for the year 2018, and it is in real USD per capita. MYS data, specifically the MYS by Broad Age for the population aged 15 and older, is obtained from the Wittgenstein Centre Data Explorer [42] for the year 2015, the most recent available year. The average meat consumption per capita in 2017 is obtained from the FAO Food Balance Sheets [43], covering the domestic supply of all meat products. It corresponds to the element Food supply quantity (kg capita −1 yr −1 ) and the aggregated item Meat (Total) in the FAOSTAT database. We tag each country in the Facebook dataset with its geographic region as defined in the MESSAGE integrated assessment modelling framework [44]. Table 2 lists the region acronyms used throughout this study, and supplementary figure 14 visualizes the regions.

Multiple linear regression of country-level indicators
We analyse the dependency between the Facebook audience fraction interested in vegetarianism, meat consumption and other socioeconomic indicators at the country level (figure 4) using a multiple linear regression model denoted in equation (5). We opt for linear regression over nonlinear due to the ease of interpretation of the results. The dependent variable F veg,j is the audience fraction as defined in equation (3), while the independent variables are GDP per capita (GDP j ), mean years of schooling (MYS j ) and meat consumption per capita (M j ) of each country. We test an alternative model without a constant, and it leads to a slightly worse fit (R 2 = 0.526) than the model with constant (R 2 = 0.54) as supplementary figure 9 demonstrates. To fit this regression model to the data, we use an ordinary least squares (OLS) model using the Python Package StatsModels [45].

Feature importance based on regression tree models
We employ ML models to identify the relative importance of demographic factors (age, gender, education, location) included in the granular Facebook audience segmentation data. This choice of ML models is motivated by the limitations of statistical models to address the collinearity between the input factors such as income and education. We use a regression tree model that can address this drawback of statistical models not for prediction purposes but to quantitatively link the input features (demographic and socioeconomic factors) to the output (audience fraction) and to use the interpretation functionality of this model. Below, we describe the XGBoost learning algorithm that we used to build a regression tree model, and Shapley additive explanation values that we used for calculating feature importance on this model.

XGBoost learning algorithm
XGBoost is an ensemble learning method based on gradient-boosted decision trees [46], meaning that the tree ensemble is formed by additive training where each new tree is fit to the data considering what has been learned in the previous steps, as opposed to random forests where each tree is fit by random bagging of the training data. XGBoost is shown to provide a robust performance, accuracy and computational efficiency on classification and regression tasks compared to linear regression and deep learning methods [46,47]. XGBoost has been widely used in scientific applications from disease diagnosis in healthcare [48,49] to environmental pollution prediction [50,51]. We choose to use tree-based methods in general and XGBoost in particular due to two reasons: first, the relationship between demographic factors and the audience fraction is nonlinear as supplementary figure 8 shows, and there are potential collinearities between these factors. For instance, education level is dependent on age to some extent since higher educational attainment takes time, or the country and education level may be dependent due to the GDP and MYS relationship shown in supplementary figures 6 and 7. Tree-based methods better address these nonand collinearities than linear regression. Second, we use these ML methods to identify the factor importance, not for prediction. Therefore, explainability of tree-methods provides an advantage for our purpose. XGBoost, in particular, is chosen for its superior performance over other tree-based methods. Supplementary figure 10 illustrates a comparison of linear regression, random forest and XGBoost on our dataset, with XGBoost leading to the lowest mean squared error (MSE), hence better accuracy.
For implementation, we use the Python implementation of XGBoost after splitting the data into a 75/25 training/test set with random shuffling. Although overfitting should be avoided in training tree-models since it can cause conservative predictions, we aimed at a low MSE between the test data and model predictions since we use the model for now-casting, i.e. explaining the present data. We iterated over different parameter values of the XGBoost algorithm, such as the learning rate, maximum tree depth, objective function and tree method. Supplementary figure 11 shows the model fit for the first two lowest MSE options. We obtained the lowest MSE with a learning rate of 0.5, a maximum depth of 9, an objective function based on logistic regression and the tree construction method is set to 'auto' , which uses a heuristic to choose the fastest tree construction method from the available options. In feature importance analysis, we use this model with the lowest MSE resulting from the abovementioned specifications.

Shapley additive explanation (SHAP) values
To estimate the importance of demographic factors for the audience fraction interested in low-carbon lifestyles, we used Shapley additive explanation values [47]. Shapley values originate from game theory, where they are used to calculate the individual contributions to the cooperative payoff in an n-player game regardless of the order of coalition formation [52]. They are adopted in ML since they meet all desirable properties of an explanation model, i.e. a model that is used to explain the behaviour of a prediction model based on the individual contribution of input factors (features) [53]. Compared to other metrics used in ML, Shapley additive explanation values provide more robust conclusions for feature importance [47], since they can better account for high order interactions, correlations and categorical features with highly imbalanced classes, as we have in our dataset.
To calculate the Shapley values on the tree model we generated with XGBoost and visualize the results, we use the tree explainer feature of Python package SHAP [47,53] and its supporting visualizations.

Results
We consider the fraction of Facebook audience interested in low-carbon lifestyles as a proxy for the spread of this phenomenon in each country where Facebook penetration is relatively high, and as a cross-country comparison indicator. Figure 2 visualizes the relative spread of sustainable living interest and vegetarianism interest of the Facebook audience in 115 countries in January 2020. Australia, New Zealand, Sweden and Denmark have the highest ranks for the sustainable living interest (4.9%-3.5%) with a mean value of 0.7% across all countries. Vegetarianism interest is most common among the Facebook audience in Singapore, Sweden, Finland and Israel (∼18%), with a mean value of 7.5% across all countries. Despite the countries that show relatively high interest in both sustainable living and vegetarianism, such as Scandinavian countries, the two phenomena do not strongly correlate (supplementary figure 2). In other words, the country-wide interest in sustainable living does not lead to a country-wide interest in vegetarianism as an indicator of sustainable diets, or vice versa.

Data consistency
We investigate the consistency of Facebook data with other offline and online sources through a series of comparisons shown in figure 3. There is no large scale data available about sustainable living interest to our knowledge. Therefore, we perform these comparisons only for the interest in vegetarianism. Figure 3(a) illustrates the Facebook audience fraction interested in vegetarianism in N = 30 countries with respect to the survey results about vegetarian population fraction in those countries. The statistical measures do not indicate a strong consistency between the two datasets, with a small and negative correlation (r = −0.18), a high p-value (0.335) for a null hypothesis that the slope of linear regression is zero, and a high mean absolute percentage error (181%). It must however be reminded that the available survey data is on average 4 years older than the Facebook data, based on limited sample sizes and different collection methods (see supplementary table 2). The deviation of Facebook data from the empirical data is smaller for a few countries with recent surveys. For instance, according to the data downloaded in January 2020, the audience fraction interested in vegetarianism is 8%, 10% and 9% in Germany, Switzerland and the US, respectively. Empirical surveys reported the fraction of vegetarians as 7.6% in 2016 in Germany [54,55], 11% in 2017 in Switzerland [56], and 7.9% in 2016 in the US [57]. Figure 3(b) compares the ranking of 110 countries that are common in both Facebook and Google Trends country lists with respect to vegetarianism interest in these two online platforms. Despite discordances between the two data sources, the results show a relatively strong positive correlation (r = 0.49), statistically significant linear relationship (p < 0.001) and a smaller mean percentage error (mape = 100%). Therefore, Facebook audience data is more coherent with Google Trends, another indicator of online activity, than it is with offline empirical data. Figure 3(c) compares the Facebook interest in vegetarianism to meat consumption in 114 countries. The results show a strong positive correlation (r = 0.64) and a statistically significant positive linear relationship (p < 0.001). This finding is counterintuitive, because if the Facebook interest in vegetarianism is an indicator of actual interest in vegetarianism, one could expect meat consumption to be low in the countries with high vegetarianism interest. However, meat consumption is stated to be highly dependent on income both at an individual and national level [58,59], while vegetarianism interest is linked to high income and education at an individual level, too [57,60,61]. Therefore, the positive relationship in figure 3(c) is due to common underlying factors as we discuss in more detail below. Still, in the countries where the Facebook vegetarianism interest is high, we observe a negative relationship between the vegetarianism interest and the trend of meat consumption between 2014 and 2017 ( figure 3(d)). In other words, in countries with high vegetarianism interest, meat consumption per capita has declined between 2014 and 2017. This negative relationship visualized in figure 3(d) is present even if Lebanon, the outlier country with the lowest average fractional change in meat consumption, is removed (see supplementary figure 4 for the correlation statistics when Lebanon and other outliers are removed, and supplementary figure 5 for the relationship between Facebook data and meat consumption trends in all countries). Therefore, even though the Facebook audience data do not fully align with the empirical surveys and actual consumption for the reasons we have discussed, it captures the consumption trends especially in countries where meat consumption has declined and Facebook vegetarianism interest has been high.
We test the relation between Facebook vegetarianism interest and its potential predictors at the country level-meat consumption, GDP per capita and education (MYS) -in a multiple linear regression model (see section 2). According to the results in figure 4, the three factors explain 54% of the variation in the Facebook vegetarianism interest (R 2 = 0.54). The relationship between the Facebook audience fraction and the three factors is significant (p < 0.05 for each except GDP per capita) and positive. Education (MYS) appears as a more important predictor than income (GDP per capita) and meat consumption. However, high correlation between MYS, GDP and meat consumption (supplementary figures 6 and 7) and a high variance inflation factor (VIF) for meat consumption (figure 4(d)) indicate multicollinearity in this dataset, for instance the effect of education on income, hence meat consumption. Therefore, we conclude that the positive correlation between Facebook vegetarianism interest and meat consumption reveals the mutual underlying factors. In order to better understand the relationship between vegetarianism interest and the socioeconomic and demographic factors, and to derive a robust ranking of those factors despite their multicollinearity, we analyse the granular Facebook data for each audience group using ML techniques.

Importance of demographic factors for low-carbon lifestyle interest
The Facebook dataset includes audience groups defined by four factors: gender, age cohort, education level and country. No information on income is available on Facebook audience data, therefore we cannot include the income factor at the same granularity level. Still, due to the strong correlation of GDP per capita to meat consumption and vegetarianism interest at the country level (supplementary figure 7), we add GDP per capita as an additional factor to In plots a-c, x and y axes show the values normalized according to the maximum of each, and brought to (0-1) range for comparison. Solid blue lines are the simple linear regression results with 95% confidence interval marked by the shaded area. The red lines depict the multiple linear regression results for each independent variable in an isolated way (slope * x). The table in (d) summarizes the multiple linear regression results (R 2 = 0.54), with the column coef listing the regression coefficients for the three predictors, std err is the residual standard error, t and p-values (P>|t|) for these predictor coefficients, and the lower and upper bounds of the 2.5% confidence interval. VIF shows the variance inflation factor for each predictor. High VIF for meat consumption and MYS indicate a high multicollinearity. account for the country level income. We also add the geographic region of each country, assuming cultural similarity among the countries in each region. We then identify the relative impact of each factor on audience fraction interested in vegetarianism by building a regression tree model on the dataset of N = 12884 audience groups, and computing shapley additive explanation values on this model (see section 2).
Education is the most important driver of vegetarianism interest in the Facebook audience followed by gender, GDP per capita, age and region ( figure 5(a)). This finding resonates with the empirical studies which found that vegetarians tend to be more highly educated than meat-eaters [60][61][62][63]. The relationship between the impact of education and vegetarianism interest is nonlinear, though, high education levels leading to either a highly positive or highly negative impact on the vegetarianism interest. This is demonstrated by the bimodal distribution of individual importance metrics of each data point (individual SHAP values), with high education values on the two ends, in the top row of figure 5(b). Complementing this distribution of education impact, figure 6(a) shows the impact of each education level depending on the GDP per capita of the country. These figures highlight a dual effect of education. From high school to university graduates, the impact of education on the Facebook vegetarianism interest is increasing, but it is much lower among the professional and doctorate degree holders (supplementary figure 8(b)). There are two possible explanations for this. First, this can be attributed to the increase in income as the education level increases, and income correlates with high meat consumption as it has long been known [58,59]. Second, it can be due to the weak representativeness of these groups on Facebook audience data, since the doctorate graduates constitute a very low fraction of the population (1.1% in OECD countries on average [64]). Therefore, the vegetarianism interest in these small educational attainment groups should be further investigated.
Gender has a very distinctive impact on vegetarianism interest, with females leaning towards a higher interest in vegetarianism ( figure 5(b), 2nd row). This finding is also supported by the available empirical studies [60,62,65]. Therefore, the Facebook audience data complements and supports the local empirical findings by covering a much larger population. The impact of age as a driver of vegetarianism interest is slightly lower than gender, and the distinction between young and old is not as clear. Empirical studies [57,60] state that the youth have a wider interest in vegetarianism. In figure 5(b) (4th row), red points representing older age cohorts tend to accumulate around negative Shapely values hence lower vegetarianism interest, whereas the positive The higher the SHAP value of a factor, the higher the vegetarianism interest. In (b), each dot refers to a data point, which is a demographic group defined by age cohort, gender, country and education level. The data points are stacked vertically to show the density, and coloured according to the feature value. For education level and age cohort, red points refer to higher education levels and older ages, respectively, whereas the blue is for lower education and younger age cohorts. For gender, red refers to males and blue refers to females. For countries, the countries that are last in the alphabetic order are marked with red. Similarly red-blue colour scale for GDP refers to the high-low spectrum. The 11 regions are shown on an additional colour bar on panel (b). See section 2 for the definition of regions. The figure is created using the Python package shap [47]. Shapley values coincide with younger age cohorts. However, the youngest age cohort makes the most negative impact on vegetarianism interest, implying that vegetarianism interest is not high among the very young Facebook audience. Supplementary figure 8(a) supports this finding, as the age cohort 15-19 has the minimum average audience fraction interested in vegetarianism.
GDP per capita has a positive effect on vegetarianism interest on average, with high GDP values leading to a positive impact and low values leading to a negative impact (figures 5(a) and (b)). In other words, GDP per capita and the Facebook vegetarianism interest are parallel as demonstrated before with a few exceptions caused by regional differences, such as SAS countries having a low GDP but high vegetarianism interest. GDP per capita also interacts with the effect of education level, as illustrated in figure 6(a). Being a college or high school graduate in low-income countries has a higher impact on vegetarianism interest than it has in high-income countries, and than other education levels. In the doctorate and professional degree groups, living in a high-income country results in a negative impact on vegetarianism interest (see the lowest part of the doctorate degree column in figure 6(a)). These findings indicate that general assumptions, such as a steady positive relationship between vegetarianism and GDP per capita or education, do not globally hold. Heterogeneity across countries should be taken into account not only for individual effects, i.e. education on vegetarianism, but also for the interaction of effects.
Geographic regions also play a distinct role in the Facebook audience's vegetarianism interest. Western Europe (WEU), South Asia (SAS), Pacific OECD Countries (PAO) and other Pacific Asia (PAS) have a high vegetarianism interest, whereas Sub-Saharan Africa, Centrally Planned Asia and Eastern Europe are associated with low vegetarianism interest. While this can be explained by culture to some extent, i.e. low meat consumption in India in South Asia due to religious reasons, the similarity of GDP per capita within regions plays an important role in this distinction between high and low interest ( figure 6(b)). For instance, in high-GDP regions (Western Europe, North America and Pacific OECD) the vegetarianism interest is high. Being in Latin America and Middle East makes a similar positive impact on the vegetarianism interest as in Pacific OECD countries, despite their lower GDP per capita values.

Discussion
This study showed that the audience data of OSM platforms can be a useful source to analyse the drivers of low-carbon lifestyles at the global level by taking the societal heterogeneity into account. Furthermore, specifically in the case of meat consumption, it highlighted the complex interplay between income, education, meat consumption and the interest in plantbased diets, since the GDP per capita underlies both meat consumption and vegetarianism interest. Our findings showed that the fraction of Facebook audience interested in vegetarianism in a country positively correlates with the average meat consumption per capita, implying that a wider interest in vegetarianism in a country does not lead to a lower meat consumption. However, in the countries where Facebook audience's interest is high, there is a declining trend in meat consumption. In other words, Facebook data does not indicate a negative relationship between vegetarianism interest and meat consumption on a global scale, but it captures the trend of increasing vegetarianism interest and declining consumption.
The second main finding of this study is that education is the most important driver of vegetarianism interest of the Facebook audience among basic demographic factors such as age, gender, education level, country-level GDP per capita and geographic regions. High school and college graduates have a higher interest in vegetarian diets than others and education plays a distinctive role especially in lowincome regions. Vegetarianism interest among the doctorate graduates on Facebook is low, indicating a non-monotonic positive relationship between education and vegetarianism interest. However, since the doctorate graduates constitute a very low fraction of the population and Facebook audience, the spread of vegetarianism and the representativeness of Facebook audience at this education level should be further investigated for a definite conclusion.
This study also showed that gender is a strongly distinguishing factor for vegetarianism interest on a global level, females having a significantly higher interest. The young and middle-aged (20-49) has a wider tendency for vegetarianism interest, yet the difference between age cohorts is not sharp, and the youngest cohort of the Facebook audience included in this study (15)(16)(17)(18)(19) has the lowest fraction interested in vegetarianism. Our findings at the global level about the effect of education, gender, age and income on plant-based dietary choices resonate with empirical findings from USA [57], Germany [60], Belgium [61]. Therefore, this analysis of Facebook market segmentation data complements empirical studies by extending their findings to a global scale with larger samples, and also highlights peculiar issues for instance regarding the youngest age cohort or highest education level.
GDP per capita is found to be one of the key factors that make a positive impact on the vegetarianism interest. However, while it enables distinguishing the countries with low and high income, it is not a precise indicator of personal income, hence both this study and Facebook audience data are limited in investigating the effect of personal income on low-carbon lifestyle choices. This limitation can be addressed in future studies that focus on the income effect for instance by matching the social media audience data with country level personal income statistics based on demographic factors (e.g. education) and location, or by using proxies within the audience segmentation data such as interest in luxury as done in market research.
Another limitation that should be tackled while using Facebook audience segmentation data to analyse the drivers of low-carbon lifestyles is the availability of relevant interest categories and keywords. The interest group vegetarianism we used in this study is a relatively direct indicator of plant-based choices, whether the motivation behind this choice is animal welfare, religion or pro-environmental preferences. In other consumption areas such as heating and cooling demand, though, Facebook audience segmentation may not be as categorical as vegetarianism to represent the consumer preferences. Therefore, similar future analyses should be based on a representative relationship between the available interest categories and consumption areas.

Conclusion
Reduction of food and energy demand is often quoted as a highly promising climate change mitigation option. This requires widespread behavioural changes across the global population. Existing mitigation assessment frameworks, such as those used by IPCC, are limited in feasibility consideration since they lack such behavioural aspects of consumer response [66]. However, it is of crucial importance to include behavioural considerations in mitigation scenarios by bridging across disciplines in order to guide decision-making for a sustainable and healthy future [67,68]. There are a few initial studies that incorporate behavioural factors into modelbased integrated assessments of feasible mitigation potential [69][70][71], yet such quantitative analyses are bounded by data availability on a global scale.
This study addressed this data gap by investigating the drivers of low-carbon diets on a global scale based on the Facebook audience segmentation data. The conceptual agreement between the conventional empirical data and Facebook audience data shown in this study underlines the potential of combining these two sources for quantifying the trajectories of lifestyle change. In particular, while empirical studies and surveys shed light on the nuances of heterogeneity and provide a deeper understanding of low-carbon lifestyle choices, digital data, e.g. social media data, can extend the geographic, temporal and contextual scope of analysis and broaden the evidence.
The main policy implication of our findings is that education should be at the centre of policy design for stimulating low-carbon diets. Other main demographic factors such as gender and age are also distinguishing, with females and younger people having a stronger interest in plant-based diets. Therefore, social heterogeneity in terms of these key factors should guide the assessment of any policy lever that aims to incentivize low-carbon diets. Education can also be a powerful lever itself, especially to counteract on the adverse effect of income. It is widely accepted that affluence has increased environmental degradation more than technological progress can prevent it, therefore the affluent citizens are central to reversing environmental degradation [72]. Although intervention studies report that targeted short education, such as those on multiple adverse consequences of eating meat, does not necessarily lead to behaviour change [73], our results show that formal education level is a strong determinant of interest in plant-based diets. Therefore, if the economic growth is to be continued, to make it 'green' , school curricula can be instrumental to raising awareness of responsible consumption and sustainable choices among erudite and affluent citizens.
The main implication of our findings for further research is that societal heterogeneity should be at the forefront of quantitative scenario studies that evaluate the demand-focused mitigation and sustainability policies. Given that social-demographic factors such as education, gender and age are highly important in lifestyle change, hence demand, their future projections should guide the development of demand scenarios. Large scale audience data of social media platforms consistent across a large number of countries and large population groups can assist scenario development by quantifying the demand depending on social heterogeneity. It can provide insights about temporal trends of low-carbon lifestyle interest if the data is tracked over time. Therefore, it can help coupling of behavioural models of societal dynamics and integrated assessment models of environment and economy to ensure plausibility and feasibility of demand-focused mitigation scenarios.
Still, demographic and socioeconomic heterogeneity explored in this study through Facebook audience data is not sufficient to capture the psychosocial drivers of lifestyle change. In addition to the data on audience size used in this study, text analysis such as topic modelling and sentiment analysis on usergenerated content [74,75] can be useful to analyse the psychosocial drivers of lifestyle change. Social and personal norms, for instance, is a highly cited driver of dietary shifts and lifestyle change [69,76]. Social media data can be useful especially to quantify and simulate the social norm effect in lifestyle change scenarios. This requires scientists to access anonymous data about social connections and diffusion that are currently not public. Therefore, the need for more and better data to analyse low-carbon lifestyles recalls the growing demand of scientists from technology companies to publicize the user data for common interest [77,78].