The response to nutritional labels: Evidence from a quasi-experiment.

This paper evaluates a UK policy that aimed to improve dietary information provision by introducing nutrition labelling on retailers’ store-brand products. Exploiting the differential timing of the introduction of Front-of-Pack nutrition labels as a quasi-experiment, our ﬁndings suggest that labelling led to a reduction in the quantity purchased of labelled store-brand foods, and an improvement in their nutritional composition. More speciﬁcally, we ﬁnd that households reduced the total monthly calories from labelled store-brand foods by 588 kcal, saturated fats by 14 g, sugars by 7 g, and sodium by 0.8 mg. © 2020 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).


Introduction
The recent increase in diet-related health problems has led to much discussion about whether and how we can improve the quality of dietary intake in the population. Policy makers have explored different ways to improve individuals' diets, such as by launching the "5-a-day" campaign (Capacci and Mazzochi, 2011), introducing targeted benefits for fruits and vegetables (Griffith et al., 2018a), taxing soft drinks (Fletcher et al., 2010;HM Treasury, 2018), and encouraging manufacturers to reformulate their products (Griffith et al., 2016a) for example by removing artificial trans-fats (Department of Health, 2011a). Another policy that has received much interest is nutrition labelling. This paper investigates the response to a large-scale UK policy that introduced Front-of-Pack (FOP) nutrition labels. In 2006, the UK Food Standards Agency (FSA) recommended retailers to adopt nutrition labelling on all store-brand (as opposed to branded) products within seven specific food categories. 1 Although this recommendation was voluntary, it was taken up by several retailers. We start by exploring the overall, aggregate, impact of this policy by comparing changes in food purchases at retailers that introduced labelling to changes in purchases at retailers that did not introduce labelling. We then investigate the households' response by examining changes to the quantity and nutritional quality of foods purchased, distinguishing between products that were labelled and those that were not.
Our identification strategy uses three features. First, we exploit the fact that the timing of the introduction of FOP labels varied across retailers in a quasi-experiment. Second, we use the fact that nutrition labelling was introduced only for store-brand products within the seven food categories, but not for branded ones. Third, we account for any fixed household-retailer characteristics (we refer to the household-retailer combination as a spell), using only within-spell changes in dietary choices.
We make three main contributions to the literature. First, we exploit the quasi-experiment to estimate the effects of the policy on the quantity of foods households buy. Our triple-difference approach compares within-spell changes in food choices before and after the introduction of the policy for households shopping at retailers that introduced labelling (i.e. 'treated' households) to within-spell changes for households shopping elsewhere (i.e. 'control' households), distinguishing between store-brand (i.e. those potentially affected by labelling) and branded (i.e. not affected by labelling) products. We do this separately for foods that were recommended for labelling by the FSA (i.e. ready meals, burgers/sausages, pies, etc.) and foods that were not. Within the latter, we distinguish between 'indulgence' foods (i.e. cakes, desserts and cookies) and everything else, due to their distinctly different nutrient profiles. Indeed, these indulgence foods are the least healthy on a number of nutrients and they are a large food group on their own. Within each of these food groups, there are store-brand and branded products, but only within the foods recommended for labelling did (some) retailers introduce a FOP label to their store-brand products.
Second, we explore whether the policy affected the quality, or nutritional composition, of households' shopping baskets. We use a Nutrient Profile Score that captures the healthiness of foods to estimate whether the healthiness of households' shopping baskets changed in response to the policy. We also explore the effects of the policy on the actual nutrient purchases, distinguishing between the four nutrients shown on the FOP label: energy (kcal), saturated fats, sugars and sodium. 2 Finally, we contribute to the debate on the potential heterogeneous effects of food policies. Several studies show the importance of demographic and socio-economic characteristics in explaining the use and understanding of nutrition labels. 3 We build on this by incorporating such heterogeneities into our quasi-experimental setting.
Nutrition labelling is a frequently discussed topic in policy-circles because it is an attractive public policy that targets transparency (Weil et al., 2013) and respects freedom of choice, without imposing hard regulations. Standard economic theory suggests that disclosure addresses typical market failures such as asymmetric information (see for example, Loewenstein et al. (2014) for a review and studies cited below). In addition, behavioural economics suggests that individuals have a certain propensity to err and may systematically over-or under-estimate the health consequences of their actions (Loewenstein et al., 2014). Therefore, the introduction of labelling is justifiable to inform individuals of the costs they may impose on themselves, but may fail to internalise at the time they shop. However, the expected effects of labelling are ambiguous. They depend on households' prior beliefs about the 'healthiness' of their shopping baskets (see for example Bollinger et al., 2011). In general, households are unlikely to have perfect knowledge of the nutritional composition of their purchases. Hence, labelling will increase the information available to them. If they consider their shopping baskets to be relatively healthy, but the labels suggest otherwise, they may decide to improve the nutritional quality of their baskets with the introduction of labelling. Instead, however, if nutritional labels indicate that products are healthier than households initially believed, they may decide to reduce the nutritional quality of their baskets. Even if households are perfectly informed about the nutritional content of their purchases, labelling may increase its salience and affect their food choices.
Studies in psychology suggest that the way information is provided is important (see e.g. Loewenstein et al., 2014). As individuals display limited attention and cognitive knowledge, they will trade-off time and costs of finding and interpreting information. A key aspect is therefore information simplicity. However, studies investigating which type of label is most effective show mixed findings. Some studies find that categorical labels (e.g. displaying stars or letter grades) are better understood than continuous labels (see e.g. Thorne and Egan, 2002;Weil and McMahon, 2003;Malam et al., 2019), whilst others recommend the use of a hybrid system that combines simple colour-coded labels (also known as traffic lights) with Guideline Daily Amounts (GDA; Malam et al., 2019). 4 Others again show that traffic lights lead to more correct nutrient interpretation, and that they are better at inducing consumers to switch to healthier foods than GDA-type labels (see e.g. Kelly et al., 2009;Borgmeier and Westenhoefer, 2009;Hawley et al., 2013). Sacks et al. (2009) find that the traffic light system alone does not impact on the relative healthiness of consumer purchases. Hence, despite nutrition labelling being identified as an important determinant of food choice, with consumer research suggesting that labelling may help improve the nutrition and health of the population (Cowburn and Stockley, 2005;Lobstein and Davies, 2009), whether (and which) labels improve households' diets remains an empirical question.
Our results suggest that the introduction of labelling led to a reduction in the quantity purchased of labelled storebrand foods (i.e. ready meals, burgers/sausages, pies, etc.), as well as an improvement in their nutritional composition, showing a reduction in monthly calories purchased (of 588 kcal), saturated fats (13.7 g), sugars (6.9 g), and sodium (0.8 mg), with no significant changes in the nutritional composition of branded (and therefore unlabelled) versions of these foods. The changes are similar to a 9-14% reduction in average nutrient purchases of labelled storebrand foods, or a 0.1-0.9% change relative to the total monthly nutrients from all foods. Furthermore, in addition to improving the nutritional composition of labelled storebrand foods, we find an improvement in the healthiness of cakes, desserts and cookies. Although these results may be counter-intuitive, as these foods remained unlabelled throughout our observation window, this suggests that the introduction of FOP labels may have urged households to think more widely about their dietary choices. 5 Hence, although cakes did not have front-of-pack labelling, the increased salience of nutrition information may have made households more conscious of the nutritional contents of their baskets, leading them to change the nutritional value of their grocery basket more generally, rather than such changes being confined to products with FOP labels only. Indeed, some psychological research reports that the provision of objective nutrition information can activate self-control and lead consumers to choose health over indulgence (see e.g. Hassan et al., 2010;Baumeister, 2002).
The rest of this paper is structured as follows. Section 2 highlights the relevant literature and Section 3 discusses the specific Front-of-Pack labelling policy that we exploit in this paper. Section 4 describes the data, with Section 5 presenting the empirical strategy, and Section 6 discussing the results. Section 7 presents the robustness analyses; Section 8 concludes.

Existing literature
There is a large literature on the determinants, use, understanding, and effectiveness of nutrition information. The most relevant work to our research is that related to nutrition labelling. A recent systematic review, investigating the relationship between food labelling and consumer food choices, finds that labelling is correlated with lower intake of energy and fats, whilst it increases vegetable consumption (Shangguan et al., 2019). Studies that focus on consumer preferences for labels find that consumers value information on calories, but that calorie information alone is not enough (Lando and Labiner-Wolfe, 2007;van Kleef et al., 2007;Malam et al., 2019). Grunert et al. (2010) show that the degree of understanding of nutrition labels is much higher than its usage: whilst only 27% of UK shoppers use nutrition information when making a selection in store, between 70% and 90% of individuals show a correct understanding of the label. Levels of understanding are also higher among younger cohorts and those in higher social classes (see also Sinclair et al., 2013;Hess and Siegrist, 2012;Feng and Fox, 2018), potentially due to better numeracy and health literacy in these groups (Feng and Fox, 2018). As such, providing new information through nutrition labels may differentially impact the higher and lower socio-economic groups. On the one hand, health capital theory argues that more educated individuals are more efficient and productive in their health investment decisions (Grossman, 1972), meaning that we may see a larger response to labels for these groups. On the other hand, these individuals have a higher opportunity cost of time, reducing their ability to acquire new information (Becker, 1977(Becker, , 1975Schultz, 1975). Furthermore, if they are better informed about the nutritional content of their grocery baskets, the labels do not necessarily provide 'new' information, perhaps leading to smaller changes in their dietary choices compared to the lower socio-economic groups, for whom the nutritional labels are more informative. Hence, it is unclear for which groups we expect labels to have a stronger effect, and it remains an empirical question which we explore below.
A second strand of the literature is concerned with identifying the causal consumer responses to food policies using quasi-experiments such as the introduction of the U.S. Nutrition Labelling and Education Act (NLEA) in the mid-1990s (Heike and Taylor, 2013;Guthrie et al., 1995;Wisdom et al., 2010), the introduction of menu calorie labelling in restaurants and food chains (Restrepo, 2016;Bleich et al., 2016Bleich et al., , 2015Downs et al., 2009;Bollinger et al., 2011;Elbel et al., 2009;Mathios, 2000;Variyam, 2008), the use of health advertising information in the cereal market Mathios, 1990, 1995) and the introduction of FOP labels and a fat tax in France (Allais et al., 2015). These studies find some evidence of improved dietary choices due to the interventions, though the effectiveness often depends on the socio-economic and demographic characteristics of consumers.
Also related is the literature that explores the effect of labelling using field and lab experiments (Berning et al., 2010;Cawley et al., 2015;Kiesel and Villas-Boas, 2013;Heike and Taylor, 2013;Dubois et al. 2020). For example, Berning et al. (2010) argue that labelling on healthy foods may in fact decrease their purchases as products are perceived to be less tasty. However, Cawley et al. (2015) find that the introduction of a nutrition rating system led consumers to buy a more nutritious mix of products. Using a large RCT of front-of-pack nutrition labels, Dubois et al. (2020) also finds small improvements in the nutritional quality of labelled foods.
Finally, although we only speak to this indirectly, a further strand of literature is concerned with the (strategic) responses of manufacturers and retailers to food policies (see e.g. Bonnet and Réquillart, 2013a,b;Griffith et al., 2016b,a;Namba et al., 2013;Nakamura and Zerom, 2010;Unnevehr and Jagmanaite, 2008;Shangguan et al., 2019). For instance, Griffith et al. (2016a) show that threequarters of the reduction in salt intake following the UK government's strategy to reduce salt consumption was due to reformulation of food products by manufacturers. Similarly, Unnevehr and Jagmanaite (2008) argue that incentives to disclose the trans-fat contents in foods have led to product reformulation and modern biotechnology development of improved oilseed. Consistent with this, the meta-analysis by Shangguan et al. (2019) shows that food labelling significantly reduced the contents of trans-fats and sodium (see also Namba et al., 2013;Wu and Sturm, 2014;Louie et al., 2012;Vyth et al., 2009). In summary, therefore, this literature shows that manufacturers actively respond to government policies by reformulating their products and investing in technological development, directly affecting households' shopping baskets.

Front-of-Pack labelling
In 2006, the UK Food Standards Agency (FSA) recommended retailers to start Front-of-Pack (FOP) labelling for store-brand products on seven types of food: ready meals, burgers/sausages, pies, breaded/coated meats, pizza, sandwiches and cereals. These foods were chosen because consumer research indicated that individuals had difficulty assessing their nutritional quality, and because they tend to be eaten frequently or in large quantities (Denny, 2006 6 There is no evidence that the FSA lobbied with other manufacturers to introduce labelling. In fact, FOP labelling was not adopted by any of the major food manufacturers (NHF, 2007b). Therefore, labelling was only introduced for store-brand versions of the seven food categories within specific retailers, with no changes in the labelling of branded products.
FOP labels provide a summary of the nutritional content of each food product, showing the amount of energy, saturated fats, sugars and sodium. This information is provided per 100 g or millilitres, with some products additionally providing the information per serving if the serving size is greater than 100 g (Food Standards Agency, 2007).
There are three types of FOP labels currently in use: the traffic light system (TLS), guideline daily amounts (GDAs) and a hybrid version using elements of both. TLS is a colour coded scheme, denoting the amount of calories, fats, salt and sugars in a product by the colours red (high), amber (medium) and green (low). Both the colour and its characterisation as low, medium or high are displayed on the FOP label. GDAs show the contribution that each of these nutrients make towards the adult GDA, but do not involve colours. 7 The hybrid scheme combines the former two, dis-6 Table B.1, Appendix B, shows when each retailer introduced labelling and distinguishes between the type of label introduced. We use information on the latter in Section 7.6. Tesco and Sainsbury's had already introduced a labelling system prior to the FSA recommendations (see British Retailing Consortium). We include purchases from Tesco in our analyses, but do not consider Sainsbury's because, although they introduced labelling from January 2005, they phased out the introduction for the seven food categories over several months and would not specify the exact date of introduction. 7 The nutrient guidelines are based on figures published in the Dietary Reference Values for Food Energy and Nutrients for the UK, published by the Committee on Medical Aspects of Food and Nutrition Policy (COMA) in 1991 (Department of Health, 1991). For salt, recommendations of the Scientific Committee on Nutrition (SACN) were followed (Public Health playing both colour coding and percentage contributions on the key nutrients. In our sample, retailers introduced different labels: Waitrose and Co-Op introduced a Traffic Light System, whilst Marks & Spencer and Asda introduced a hybrid system (see also Table B.1, Appendix B). The other retailers did not introduce any labelling scheme. The choice regarding labelling introduction as well as the type of label were entirely up to retailers; we return to this issue below.

Data and descriptive statistics
We use data on all grocery purchases brought into the home by a rolling panel of households from Great Britain. Our analysis focuses on the period July 2005 to July 2008, when only the four retailers mentioned above introduced nutrition labelling. 8 The data are collected by the market research firm Kantar Worldpanel, who ensure the panel remains broadly representative over time, with household demographics being routinely collected and re-assessed approximately every nine months (Leicester and Oldfield, 2009). Purchases are recorded at the individual transaction level using a handheld scanner in the home. The data also contain information on non-barcoded items such as loose fruit and vegetables or meat from the meat counter. 9 The advantages of these data are that they are longitudinal, they provide various demographic characteristics, and report very detailed information on each food product. We have information on quantities, prices, and characteristics at the level of the individual product, including whether it was on promotion and the products' nutritional values. The detailed product-level information allows us to identify precisely which foods in which retailers were exposed to the new nutrition labelling (e.g. store-brand versus branded foods).
Although these data are increasingly used in economics and social science research, it is important to also highlight their limitations. For instance, it is known that there are difficulties to attract some demographic groups, such as single young males. Furthermore, given the longitudinal nature of the data, participants may suffer from 'survey fatigue', reducing their reporting accuracy over time. Leicester and Oldfield (2009) show that such fatigue effects are stronger for top-up products (e.g. bread, milk). We discuss how we deal with this below. To avoid potential biases, Kantar themselves also take considerable effort to monitor participants and remove them from the panel if they believe this is a problem (Griffith et al., 2018a). Leicester and Oldfield (2009) and Griffith and O'Connell (2009) include a detailed investigation of the quality of the Kantar data, and compare the data to other UK surveys.
England, 2019), while calculations for total sugars were as described by Rayner et al. (2004). 8 We explicitly end the observation window before the start of the Great Recession, as its effects on food prices and disposable incomes may vary across localities in the UK. Furthermore, the European Parliament voted in favour of FOP labelling in July 2010. By ending the period of observation before any such new Regulations, we again avoid potential confounding by other factors. 9 Households are given a booklet with barcodes for various nonbarcoded foods for them to scan and record purchase information.
They show that conditioning on households that regularly report spending on a range of grocery products ensures that the Kantar data follow the patterns and trends seen in other UK data sources. Leicester and Oldfield (2009) compare the Kantar data to the Living Cost and Food Survey (LCFS; one of the main UK food and nutrition surveys), as well as the British Household Panel Survey (BHPS) and conclude that attrition and reporting error are relatively low. Furthermore, they show that the data have similar sociodemographic and regional profiles (see also Quirmbach et al., 2018), suggesting that they are broadly representative of the UK population.
Looking specifically at the nutritional data, Griffith and O'Connell (2009) show that the Kantar data provides significantly more information than what is available in the LCFS or the National Diet and Nutrition Survey (NDNS; the other main UK food and nutrition survey). Furthermore, they show that the Kantar data avoids the problem of under-reporting that is well-known to occur in intake surveys.
Our analysis is based on households' main monthly shopping trip. We focus on this sample for three reasons. First, households tend to buy the majority of their food in their main shopping trip, allowing us to investigate how they substitute across as well as within food groups. Indeed, our data indicate that households purchase 70% of all foods in their main shopping trip. Second, consumer research has identified time spent shopping as an important determinant of labelling use (Nayga et al., 1998;Park et al., 1542;Beatty and Smith, 1987;Mothersbaugh et al., 1993). We therefore focus on the shopping occasions for which labelling use may be more binding. Third, focusing on the main shopping trip implies that survey fatigue, which is strongest for top-up products (Leicester and Oldfield, 2009), plays less of a role. To define the main monthly shopping trip, we take the sum of all purchases in each household-month-shop ID combination, where shop ID denotes the shop postcode, and select the highest monthly food spending. 10 We follow the literature and drop household-months during which the household does not record purchasing anything for seven days (see e.g. Griffith and O'Connell, 2009;Leicester and Oldfield, 2009;Griffith et al., 2016aGriffith et al., , 2018a. 11 We consider a number of demographics, including household size, the number of children, a binary indicator whether the main shopper is full-time or parttime employed, is retired, or unemployed/not working, marital status, social class, and the age and gender of the main shopper. 12 10 For households who shop at multiple shops, we explored the robustness of our estimates to weighting the expenditures by the inverse of the proportion purchased in the main shop (i.e. by: , where q is the quantity purchased (in kg) in one shop ID, and sums over all transactions in a month (in all shop IDs)). The estimates are robust to this different specification. 11 Our results are robust to dropping household-months during which the household does not record purchasing anything for 14 days. We discuss these in the robustness analyses; Section 7.6. 12 We follow Griffith et al. (2016a) and base our classification of socioeconomic status on the National Readership Survey social grade; Our main analysis considers nine of the biggest retailers in the UK: Tesco, Asda, Morrisons, Somerfield, Co-Op, Waitrose, Lidl, Aldi, and Marks & Spencer, with a total market share of 76%, ranging from 2% for Asda to 32% for Tesco, although our estimates are robust to using all retailers. We sum up all household purchases for a set of food 'categories', such as ready meals, burgers, dairy foods, and cookies, which each belong to one of the following food groups: "Foods with labelling" (i.e. those recommended for labelling by the FSA), and "Foods without labelling", where -within the latter -we distinguish between cakes/desserts/cookies and all other foods without labelling due to their distinctly different nutrient profiles. Furthermore, Grunert et al. (2010) show that consumers prefer not to see nutrition information on 'indulgence products'. Hence, separating this food group from other unlabelled foods allows us to explore whether the response to labelling differs for this food group compared to others. 13 For each of the food categories, we distinguish between store-brand and branded foods, so a standard approach in market research. Higher social classes include top managerial roles, administrative roles and professional occupations; intermediate classes include supervisory roles, clerical and junior management, and skilled manual occupations; and lower classes include semi-skilled and unskilled manual workers, unemployed and lower grade occupations. The socioeconomic classification is based on the occupation of the main earner. More details are available at http://www.nrs.co.uk/nrs-print/lifestyle-and-classification-data/ [last accessed in August 2019]. 13 We hereafter refer to cakes/desserts/cookies as "Cakes". We drop sandwiches and cereals from our analyses for two reasons. First, sandwiches are generally purchased for immediate consumption, and therefore not brought into the home and not observed in these data. Second, Asda's breakfast cereals were not labelled until several months after labelling was introduced for other products. However, as cereals are gen- . The nutritional quality is measured using the nutrient profile score (see Rayner et al., 2004). For more detail about this score, see Appendix A.
that the total spending for store-brand and branded foods sums up to the total home food basket (except for breakfast cereals). Therefore, each observation in our dataset is a household-food category-brand type-store-month-year. 14 Our main dataset includes 360,921 observations for 20,707 households between July 2005 and July 2008. Table 1 presents the main household characteristics, showing that the majority of households contain 2 or 3 individuals. 64% of households are married, and 46% are in higher social classes. The main shopper is predominantly female, with an average age of 48.6. 15 Table 2 presents the baseline (i.e. before March 2006, when the first labelling scheme was introduced) means and standard deviations of the quantity and nutritional quality of households' shopping baskets, including the total monthly nutrients purchased. 16 Information on nutrients is collected by Kantar from the back of pack nutrient information. Where nutritional labels are unavailable because products do not display them (e.g. fruit and vegetables), Kantar uses nutrients from McCance and Widdowson (2014), and they erally consumed for breakfast only, it is unlikely that dropping them would affect the substitution with other meals. 14 Within the "foods with labelling" group, the food categories include ready meals, burgers, pies, meats, and pizza. Within the "foods without labelling" group, the food categories include dairy, breads/pasta, snacks, and fish/poultry. Within the "Cakes" group, the food categories include cakes, desserts and cookies/biscuits. 'Brand type' refers to either storebrand or branded products. 15 Note that this does not imply that our sample differs from the gender distribution of the general UK population. Instead, the individual-level characteristics reported in Table 1 refer to the main shopper in the household, which -for the majority of households -tends to be the female. This is consistent with other data sources such as the Living Cost and Food Survey (see e.g. von Hinke, 2020). 16 The nutritional quality score is based on the Nutrient Profiling Model used by the FSA (Rayner et al., 2004). In short, this attaches a score to each individual product depending on the amount of energy, saturated fats, sugar, sodium, fibre, proteins, fruit and vegetables it contains. It separately looks at negative and positive nutrients (where the former include energy, saturated fats, sugar and sodium; for more information, see Appendix A). In our analysis, we only use the negative score of the Nutrient Profile Model, since these are the exact nutrients that are shown on the FOP label and therefore made more salient by this policy. Higher scores indicate more unhealthy foods. We report the average score across all individual products within a food category.
impute nutrients from similar products when foods are purchased with insufficient frequency. Table 2 shows that the quantity purchased is highest for foods without labelling, both for store-brands and branded foods. Households purchase approximately 5 kg of cakes a month, split equally between store-brand and branded products. This food group is the most unhealthy, with a nutrient profile score of 15-17, and with branded products being slightly less healthy than the store-brand versions on all individual nutrients.

Difference-in-difference models
We start by evaluating the overall effect of FOP labelling using a difference-in-difference design, comparing the change in spending, quantity or nutritional value of food purchases in retailers that introduced labelling (before and after its introduction) to the change in retailers that did not introduce labelling. This provides an estimate of the aggregate effect of the policy, given by: where y j rt is the quantity (in kg), nutritional value (i.e. nutrient profile score), total spending or expenditure share of food group j (with j = 1, 2, 3, i.e. labelled foods, unlabelled foods and cakes), purchased in retailer r at time (year-month) t. Labelling rt is a dummy that is equal to one if retailer r introduced labelling by time t. Retailer fixed effects are denoted by r,j , t,j are time fixed effects, and rt,j is the error term. The parameter ˛1 ,j captures the effect of labelling introduction on the outcome of interest, comparing retailers that introduced labelling to those that did not. We also estimate this model for the entire food basket (i.e. j ∈ 1, 2, 3), allowing us to estimate the overall effect of the labelling policy on retailers that introduced labelling versus those that did not.
In addition to estimating the aggregate effect of the policy, which would be of interest to public health officials and policy makers, we examine whether households substitute within or between food groups in response to the intro-duction of labelling. To investigate this in more detail, we explore the household-level data and turn to a triple difference approach where we distinguish between store-brand (some of which were labelled) and branded products (none of which were labelled).

Difference-in-difference-in-difference models
To investigate whether the introduction of FOP labelling affected the quantity of foods purchased by households and the nutritional quality of their food baskets, whilst distinguishing between labelled foods, unlabelled foods and (unlabelled) cakes as well as between store-brand and branded products, we estimate the following triple difference model: where y j hit indicates either the quantity or nutritional quality of food category i nested within food group j, purchased by household h at time t, where again j = {1, 2, 3}. Similar to above, we also estimate this model for the entire food basket (i.e. j ∈ 1, 2, 3), allowing us to explore the overall effect of FOP labelling on households' food shopping.
Labelling ht is a dummy variable that is equal to 1 if the retailer that household h shops at introduced labelling by time t. Hence, there is temporal variation in the time of labelling introduction, as well as household-level variation in the choice of retailer. Both of these may be endogenous, as the retailer decides if and when to start its labelling, and the household chooses where to do its grocery shopping. We discuss both issues in more detail below.
SB hi,j is a dummy variable that equals 1 if food category i within food group j purchased by household h is store-brand and 0 otherwise, and prices are denoted by p hit,k , with j and k indicating different food groups. 17 Hence, prices are household-specific, reflecting the fact that households shop in different stores and therefore face different prices, but also that they choose different products within each food category. We discuss how we deal with potential endogeneity of prices (as well as total food spending, denoted by x ht ) below.
Z ht is a vector of household demographic variables discussed above, and s,j are spell fixed effects, where the spell-level heterogeneity is defined as s,j ≡ ϑ h,j + r(h,t),j (as in Abowd et al. (1999) and Andrews et al. (2006)). In other words, we include household-retailer combination fixed effects, accounting for any time-invariant heterogeneity within these spells. This allows us to estimate how y j hit changes within a particular household-retailer combination, before and after the introduction of labelling for store-brand and branded foods. We control for a general 17 Prices are deflated by the Consumer Price Index for food and drinks with November 2005 as base period. trend in consumption over time, captured by Á j t, and allow for retailer-specific linear departures of this trend, denoted by r(h,t),j × t. Finally, t,j are year and month dummies, accounting for systematic changes in y j hit across years, as well as for any seasonality, and u hit,j is the error term, clustered by household. We are interested in the estimates of 1,j and ˇ2 ,j , which capture the effect of the introduction of labelling on the quantity and quality of food purchases for store-brand and branded foods respectively. 18 Hence, our analysis compares within-spell changes in food choices before and after the introduction of the policy for households that shop at retailers that introduced labelling ('treated households'), to within-spell changes for households shopping elsewhere ('control households'). As such, observing a reduction in the Nutrient Profile Score for e.g. labelled foods after the introduction of labelling would suggest that households change their dietary choices within labelled foods to make them healthier. 19

Identification
To estimate the parameters in Eq. (2) consistently, we require strict exogeneity of u hit,j . There are two potential concerns with this, relating to the orthogonality of prices as well as total expenditures. Prices are likely to be endogenous, as they partially reflect differences in quality from one household to another, and therefore depend on household tastes (Deaton, 1988). For example, we are likely to observe higher prices for households whose food basket consists of higher quality products.
Total expenditures (x ht ) may be endogenous due to measurement error or unobserved household-level characteristics being correlated with the quantity and quality of purchases. For example, any idiosyncratic demand or preference shocks, or taste heterogeneity, may affect the quantity and quality of purchases, as well as total expenditures.
We deal with these issues in three ways. First, we include a vector of household demographic variables, Z ht , discussed above. Second, we exploit the panel structure of our data and include spell fixed effects s,j , exploiting only variation in the quantity, quality, prices, and total expenditures within spells. And third, we instrument for prices and total food spending.
Our main analysis instruments for prices, using an approach similar to Hausman (1994) and Nevo (2001), where the instruments are defined as the prices faced by households if they had shopped in other stores within the same retailer. The identifying assumption is that after controlling for demographics and spell fixed effects, storespecific demand shocks, or valuation of food groups, are 18 Appendix D discusses and shows the estimates where we embed this triple difference approach into an Almost Ideal demand System (AIDS; Deaton and Muellbauer, 1980), modelling the expenditure shares and allowing us to explore whether households substitute between the three food groups. 19 This triple difference approach assumes that the trend for treated households would be similar to the trend for control households in the absence of labelling. As we show later (see Section 7.3), we find no evidence to reject this assumption.
independent across stores. Although other store prices also reflect quality, they do not reflect household h's specific valuation of the quality. Given this, a demand shock for one food group is independent of the price of the food in other stores. 20 We follow Griffith et al. (2018a) and instrument for total food spending using total expenditure on fast moving consumer goods. 21 This assumes that preferences for non-food products are weakly separable from preferences over the different food groups.

Results
This section presents the estimates of the effect of the introduction of FOP labelling on household food choices. However, before we do so, we report the aggregate effect of the policy on retailers, as in Eq. (1). Table B.1 of Appendix B presents the estimates. Panel A shows the effects on the quantity and nutritional quality of foods sold, and Panel B shows the estimates for total (absolute) spending and expenditure shares. For each of these outcomes, we show the effects for the total food basket (columns 1 and 5), and for the three food groups separately (columns 2-4 and 6-8). Recall here that higher nutrient profile scores indicate unhealthier foods. The findings show that, at the aggregate level, the policy improved the nutritional quality of the total food basket, with no statistically significant changes in quantity, total spending and expenditure shares (except for a small positive effect on quantity for foods without labelling). With no change in spending and quantity, but an improvement in the nutritional quality, this suggests that retailers sold similar amounts of foods post labelling introduction, but this consisted of different -healthierproducts.
We next explore whether the introduction of FOP labelling changed households' quantity and quality of foods purchased, as well as whether it led to substitution between labelled (i.e. store-brand foods of seven specific food categories) and unlabelled foods (i.e. all other storebrand foods, and all branded products). Table 3 presents the estimates of the triple difference specification from Eq. (2). Columns 1-4 use the natural logarithm of quantity as the dependent variable; columns 5-8 specify the nutritional quality. For each of these, we again show estimates for the entire food basket (columns 1 and 5), and for each of the three food groups separately (columns 2-4 and 6-8). 22 Looking first at the quantity and quality of the total food basket (columns 1 and 5), we find that the introduction of labelling led to a reduction in the total quantity of branded foods and an increase in the quantity of store-brand 20 In Section 7.2, we explore the sensitivity of these analyses to using alternative definitions of the instrument. 21 This includes other items that are commonly purchased in supermarkets, such as toiletries and household products. We refer to this as non-food. 22 All estimates are obtained from IV regressions, using "Hausman"-type instruments for prices and instrumenting food spending with non-food spending. The first stage F-statistics exceed the critical values outlined in Stock and Yogo (2005), suggesting our instruments are sufficiently strong. We do not report these here; they are available upon request.
foods. This coincided with a change in the composition of store-brand food purchases, making them healthier (i.e. a reduction in the nutrient profile score). Looking more closely at the three food groups that make up the total food basket, however, we find that the aggregate effect conceals differential effects across the three food groups. More specifically, we find that the introduction of labelling reduced the quantity of store-brand foods by 6.5%, whilst improving their nutritional composition. Simultaneously, consumers increased their purchases of unlabelled storebrand foods and reduced purchases of unlabelled branded foods. Finally, we find that labelling led to an improvement in the nutritional composition of store-brand cakes, but a reduction in branded ones. Although the impact of the introduction of FOP labels on unlabelled foods may be counter-intuitive, this suggests that nutrition labelling has made households more aware and conscious of the nutritional content of their grocery baskets, leading them to make changes to their grocery basket more generally, rather than those confined to labelled products only. We return to this point in the conclusion.
We next explore what the changes in nutrient profiles mean in terms of actual nutrient content of the shopping basket. Table 4 distinguishes between the four nutrients that are displayed on the FOP label, showing that households responded by reducing the total monthly calories from store-brand labelled food purchases by 588 kcal, saturated fats by 13.7 g, sugars by 6.9 g, and sodium by 0.8 mg, with no significant changes in the nutritional composition of branded foods within the same food group (columns 2 and 6). Relative to mean monthly nutrient purchases of store-brand labelled foods (see Table 2), these changes are similar to a 9-14% reduction, on average. 23 Similar to Table 3, we find an improvement in the nutritional composition of store-brand cakes: a reduction of 400 kcal, 11.5 g saturated fats, 37.9 g of sugars, and 0.58 mg of sodium. In sum, our results suggest that introduction of nutrition labelling affected household food choices, reducing the quantity of store-brand labelled foods, whilst improving their nutritional quality.

Robustness analyses
We next explore the robustness of our findings, investigating the sensitivity to the use of different instrument sets for prices, exploring the timing of the labelling effects, investigating potential spillover effects, and accounting for retailers' stock of products. We also run the analysis with different empirical specifications and sample definitions, and we end with an investigation of potential heteroge-23 For example, a reduction of 588 kcal from store-brand labelled foods is 12% of average monthly kcals obtained from store-brand labelled foods (i.e. from 4757 kcal; see Table 2). However, it is only 0.6% of the total monthly kcal purchases from all (store-brand, branded, labelled and unlabelled) foods (i.e. the total basket being 51,606 + 34,000 kcal, on average), even when taking into account the small (insignificant) increase of 85 kcal from branded labelled foods (see Table 4). Reductions in saturated fats, sugars and sodium from store-brand labelled foods were 14%, 13% and 9% respectively, relative to the average monthly nutrients obtained from store-brand labelled foods, or 0.9%, 0.1%, and 0.6% relative to the total monthly nutrients from all foods. Notes: The outcome variables include the (natural) logarithm of quantity (in kg; columns 1-4) and the nutrient profile score (columns 5-8). We provide estimates for the total food basket (columns 1 and 5), as well as for each of the three food groups (columns 2-4 and 6-8). The table shows the estimates from the triple difference specification in Eq.
(2). All specifications control for household-level covariates, time trends, retailer-specific time trends, year and month dummies, and spell fixed effects. All models instrument for prices and expenditure using the prices faced by households if they had shopped in other stores within the same retailer, and non-food spending, respectively. Standard errors, clustered by household, shown in parentheses. * p < 0.05. ** p < 0.01.

Table 4
Triple-difference models of the nutritional composition of the food basket.

Calories (kcal)
Saturated fats (g) Notes: The outcome variables include the individual nutrients that make up the shopping basket, distinguishing between calories (in kcal; Panel A, columns 1-4), saturated fats (in grams; Panel A, columns 5-8), sugars (in grams; Panel B, columns 1-4), and sodium (in milligrams; Panel B, columns 5-8). We provide estimates for the total food basket (columns 1 and 5), as well as for each of the three food groups (columns 2-4 and 6-8). The table shows the estimates from the triple difference specification in Eq.
(2), where the dependent variable is the individual nutrient. All specifications control for household-level covariates, time trends, retailer-specific time trends, year and month dummies, and spell fixed effects. All models instrument for prices and expenditure using the prices faced by households if they had shopped in other stores within the same retailer, and non-food spending, respectively. Standard errors, clustered by household, shown in parentheses. * p < 0.05. ** p < 0.01. neous effects. However, before we do this, we explore the potential effects of FOP labelling on retailers.

Retailer analysis
One potentially important consequence of the introduction of labelling is that retailers themselves may respond, adopting their own strategies in response to labelling introduction. Since our data capture consumer choices (as opposed to retailer decisions), we are restricted in the extent to which we can explore this, which mainly refers to the fact that we do not observe retailer prices or inventory data, but have to rely on consumer purchases to observe the relevant variables. Appendix C discusses the issues that this raises in more detail and also shows our descriptive analysis, where we investigate the effect of the introduction of Notes: The outcome variables include the (natural) logarithm of quantity (in kg; columns 1-4) and the nutrient profile score (columns 5-8). We provide estimates for the total food basket (columns 1 and 5), as well as for each of the three food groups (columns 2-4 and 6-8). The table shows the estimates from the triple difference specification in Eq.
(2). All specifications control for household-level covariates, time trends, retailer-specific time trends, year and month dummies, and spell fixed effects. Panel A instruments prices and expenditures using the lagged prices of foods within the same retailer, and non-food spending, respectively. Panel B uses the average prices faced by households who shop at the same retailer, and non-food spending, respectively. Panel C uses the price faced by households if they had shopped in other stores within the same retailer and region, and non-food spending, respectively. Standard errors, clustered by household, shown in parentheses. * p < 0.05. ** p < 0.01.
labelling on six different dependent variables that are (to some extent) under the retailer's control: the price of foods, the proportion of foods on promotion, the nutritional quality (to capture potential reformulation), the pack size, and the extent to which new products were introduced or old products were discontinued. We find that the introduction of labelling coincided with an improvement in the nutritional quality of foods, suggesting that the introduction of labelling caused some retailers to reformulate products. Furthermore, our results suggest that retailers brought forward the time to discontinue some products to take place before labelling introduction. In addition to these findings being interesting in their own right, they suggest that our household analysis is picking up not only the effect of labelling on household demand, but potentially also any effects of retailers' decisions, in particular in terms of product reformulation and the discontinuation of products. Having said that, any reformulation by retailers is unlikely to fully explain the improvement in the quality of households' shopping baskets. Indeed, the retailer-analysis suggests that reformulation improved the nutritional quality of store-brand labelled as well as unlabelled foods, indicating that retailers' reformulation affected all store-brand products, rather than being restricted to store-brand labelled foods only.

Instrumenting prices and expenditures
Our main models instrument household-level prices using prices faced by households if they had shopped in other stores within the same retailer. In Table 5, we examine the robustness of our results to different sets of instruments. Panel A specifies lagged prices of food category i in retailer r as instruments for current prices; Panel B uses the average price faced by other households who shopped at the same retailer; and Panel C uses the price faced by households if they had shopped in other stores within the same retailer and region. The different instrument sets aim to specify different 'counterfactual prices' that the household would have faced if they faced different markets in either space or time, excluding their own actual choices. The results support our main findings above: labelling led to a 7.3-11.4% reduction in the quantity of labelled store-brand foods (column 2), which coincided with an improvement in their nutritional composition (column 6). Similarly, we find an improvement in the healthiness of store-brand cakes, but a reduction in the healthiness of branded ones (column 8).

Timing of effects
Next, we explore the timing of the labelling effects to shed more light on when households changed their Fig. 1. Timing of the effects of the introduction of labelling on store-brand labelled foods. Notes: Point estimates and 95% confidence intervals are estimated from spell fixed effects regressions that control for year and month dummies, time trends, retailer-specific time trends, and household-level covariates, prices and expenditures. The dependent variable is ln(quantity). Prices and expenditures are instrumented using the prices faced by households if they had shopped in other stores within the same retailer, and non-food spending, respectively. behaviour and whether this was a persistent or temporary change. We do this by re-estimating Eq. (2), where we separate the terms Labelling ht × SB hi,j and Labelling ht into two-month bins for the pre-as well as post-labelling period. This allows the labelling effects to evolve flexibly over time. 24 Fig. 1 plots the coefficients of the interaction terms for the different periods pre-and post-labelling introduction for store-brand labelled foods. This shows no strong differential trends between treated and control households prior to the introduction of labelling, suggesting that the common trend assumption holds. Furthermore, we see an almost immediate response to the introduction of labelling, reducing the quantity of store-brand labelled foods. The reduction is visible for the full observation window, but with relatively large standard errors when splitting the data up in two-month bins, especially for periods further away from the date of introduction, the individual twomonth bins are generally not significantly different from zero.
It is possible that, as labelling was introduced, older stocks (without labelling) remained on the shelves for a period of time. Although this is less likely to be an issue for labelled products, as these are mainly perishable, we explore the sensitivity of our analyses by dropping the first month after introduction of the labelling scheme for each retailer. The findings, shown in Table B.3, are very similar 24 Another advantage of this specification is that it allows us to explore the common trend assumption underlying the triple difference approach. Indeed, the DDD assumes that the trend for treated households is similar to the trend for control households prior to the policy introduction.
to those above, suggesting this does not affect our overall conclusions.

Spillover effects
One of the assumptions in our empirical specification is that there are no "spillover", or "compositional" effects. In other words, introducing labelling in retailer r should not affect households that shop at other retailers. This assumption would be violated if the introduction of labelling leads to households switching between retailers. We explore this using a DD model, regressing a count of the number of retailers visited by each household in each year-month on the dummies indicating the date of labelling introduction for each of the retailers, household-level characteristics, a time trend, and year, month, and household fixed effects. Given the nature of the dependent variable, we estimate this using a Poisson model. Seeing a significant effect of labelling introduction on the number of retailers visited might suggest households change their shopping patterns in response to labelling. The estimates, shown in Table  B.4 in Appendix B, show no significant effects of labelling introduction on the number of shopping trips, mitigating concerns over selection into treatment.

Empirical specification and sample
We next report a set of sensitivity analyses with regards to the empirical specification and sample selection. First, we estimate our models of interest using household and retailer fixed effects rather than spell fixed effects. As we show in Table B.5 in Appendix B, the results are qualitatively similar to our main models. Second, we re-estimate Notes: The outcome variables include the (natural) logarithm of quantity (in kg; columns 1-4) and the nutrient profile score (columns 5-8). We provide estimates for the total food basket (columns 1 and 5), as well as for each of the three food groups (columns 2-4 and 6-8). The table shows the estimates from the triple difference specification in Eq.
(2), but allow the parameters of interest to differ by socio-demographic characteristics. All specifications control for household-level covariates, time trends, retailer-specific time trends, year and month dummies, and spell fixed effects. All models instrument for prices and expenditure using the prices faced by households if they had shopped in other stores within the same retailer, and non-food spending, respectively. Standard errors, clustered by household, shown in parentheses. * p < 0.05. ** p < 0.01.
our models on all retailers observed in the data (i.e. including smaller retailers, deli's, etc. in addition to the 'big 9', but not Sainsbury's for reasons discussed above). Our results are robust to the inclusion of more retailers, as shown in Table B.6. Third, we re-estimate our main models, where we exclude household-months in which the household did Notes: The outcome variables include the (natural) logarithm of quantity (in kg; columns 1-4) and the nutrient profile score (columns 5-8). We provide estimates for the total food basket (columns 1 and 5), as well as for each of the three food groups (columns 2-4 and 6-8). The table shows the estimates from the triple difference specification in Eq.
(2), but allow the parameters of interest to differ for the traffic light system (TLS) and the hybrid labelling system. All specifications control for household-level covariates, time trends, retailer-specific time trends, year and month dummies, and spell fixed effects. All models instrument for prices and expenditure using the prices faced by households if they had shopped in other stores within the same retailer, and non-food spending, respectively. Standard errors, clustered by household, shown in parentheses. * p < 0.05. ** p < 0.01. not record any shopping for a period of 14 days (rather than seven days in the analysis above). Our results, reported in Table B.7, are generally robust to this sample.

Heterogeneous effects
We next investigate potential heterogeneous effects of labelling by different characteristics of the household. For this, we re-estimate Eq. (2) interacting the labelling dummy with an indicator that equals one for high social class household (reported in Panel A of Table 6), for female main shoppers (Panel B) and for households with children (Panel C).
The findings can be summarised in two main points. First, conditional on our set of covariates and fixed effects, there are no main effects of being in high social class households, being a female shopper, or having children. Indeed, these groups have shopping baskets of similar quantity and nutritional value (controlling for household size, number of children, etc.). Second, there is substantial heterogeneity in the response to labelling for the different subgroups. For example, while both low and high social class households reduce the quantity purchased of storebrand labelled foods following the introduction of FOP labels, this reduction is larger for the lower compared to the higher social classes (column 2 of Panel A). Likewise, male shoppers reduce the quantity of labelled store-brand foods by more than female shoppers, and show a larger improvement of its nutritional composition (columns 2 and 6, Panel B). Similarly, whilst households without children reduced the quantity of store-brand labelled foods by 11.8% following the introduction of labelling, households with children increased it by 5.6% (i.e. −11.8 + 17.4; column 2, Panel C). Despite that, the largest improvements in the nutritional composition of these foods is seen for households with children. However, this is offset by a worsening in the healthiness of branded foods within the same food group (column 6, Panel C).
Finally, we examine potential heterogeneous effects of labelling by different types of labels. Recall that the four retailers that adopted labelling introduced two types of labelling systems (see also Table B.1, Appendix B). In particular, Waitrose and Co-Op introduced a Traffic Light System (TLS), whilst Marks & Spencer and Asda introduced a hybrid system. Table 7 reports the analysis that distinguishes between the two systems. The estimates suggest that our results are driven by the hybrid system, with no clear effects for the introduction of traffic lights. More specifically, the introduction of hybrid labelling led to a reduction in the quantity of labelled store-brand foods of 8.3%, and an improvement in their nutritional quality. 25 It is important to note, however, that whilst these results may be driven by actual differences in the effectiveness of traffic lights versus hybrid labelling, it may also be the case that the types of consumers who shop at the retailers that introduced a hybrid system (i.e. Marks & Spencer and Asda) are simply different from those who shop at retailers that introduced traffic lights (i.e. Waitrose and Co-Op). For example, perhaps those who shop at Marks & Spencer or Asda have a systematically different response to nutritional information provision. So rather than the results being driven by differential effectiveness of TLS versus a hybrid system, it may simply be due to different types of consumers shopping at different retailers.
Since the introduction of labelling as well as the choice of label were voluntary for retailers, we cannot distinguish between these different explanations.

Conclusion
Food and nutrition labelling has been identified as an important determinant of food choice, with consumer research suggesting that some schemes may help improve the diet and health of the population (Cowburn and Stockley, 2005;Lobstein and Davies, 2009). Nevertheless, there is little evidence of their effectiveness in terms of actual purchasing decisions and nutritional intake (see e.g. Grunert and Wills, 2007; UK House of Lords Science and Technology Select Committee, 2011).
We study the household response to the introduction of nutrition labelling on specific store-brand foods following recommendation by the UK Food Standards Agency in 2006. Our main analysis specifies a triple-difference approach, exploiting the differential timing of the introduction of labelling by different retailers, whilst distinguishing between store-brand and branded foods. The analysis examines changes in the quantity as well as nutritional quality of households' diets following the introduction of labelling.
Our results suggest that labelling led to a reduction in the quantity purchased of labelled store-brand foods, as well as an improvement in their nutritional composition. This suggests that FOP labelling caused households to substitute between food groups, as well as within food groups, reducing the quantities of labelled foods whilst improving their nutritional quality.
How do these results compare with the literature? It is hard to make a direct comparison with published studies because they evaluate different labelling interventions and use different food aggregations. In general, however, they find small (e.g. Bollinger et al., 2011;Wisdom et al., 2010;Bleich et al., 2016) or no changes (e.g. Elbel et al., 2009) in the calorie content of the diet. For example, Bollinger et al. (2011) find that calorie labelling in Starbucks led to a modest reduction of 14 calories per transaction, and Wisdom et al. (2010) find a reduction in calories of side-dishes and drinks bought at a fast-food chain of 61 calories. Similar to Elbel et al. (2009), we are able to investigate the effects of labelling on other nutrients, in addition to energy. Furthermore, we study the effects on a nutrient profile score that summarises the overall 'healthiness' of the diet. We find that households responded to the introduction of labelling by reducing the total monthly calories, saturated fats, sugars, and sodium of store-brand labelled foods by 9-14% on average, relative to the mean.
Our findings also suggest that the introduction of labelling led to an improvement in the nutritional composition of store-brand cakes. Although this may seem counter-intuitive, as these remained unlabelled throughout our observation window, we interpret this as the introduction of FOP labels inducing households to think more widely about their dietary choices. Indeed, the introduction of labelling per se may have increased the salience of nutritional information, leading to an increase in its use. As such, the introduction of FOP labelling on some products may have incentivised households to also use nutritional back-of-pack information more frequently when making their purchasing decisions. Hence, the increased salience of nutrition information may have prompted households to reconsider the nutritional composition of their grocery basket, making them more conscious of its (un)healthiness. This in turn can lead them to change the nutritional value of their grocery basket more generally, rather than such changes being confined to products with newly introduced FOP labels only. Although this is a speculation, this interpretation is consistent with the psychological literature. For example, examining the effect of nutritional information on psychological factors driving consumer choice of unhealthy foods, Hassan et al. (2010) find that nutrition information strengthens consumer self-control with an actionable target (e.g. reducing consumption of unhealthy foods). Similarly, Baumeister (2002) finds that the provision of objective information reinforces self-control and provides a greater imperative to resist temptation.
We also investigate whether nutrition labels have heterogenous effects by socio-demographic characteristics. Theory and evidence suggest ambiguous effects (Grunert et al., 2010;Sinclair et al., 2013;Hess and Siegrist, 2012;Feng and Fox, 2018;Larson et al., 2009;Block et al., 2004). On the one hand, health capital theory suggests that higher social classes are better able to process information (e.g. Grossman, 1972). On the other hand, as their opportunity cost is higher, they may choose not to use this information. In addition, the lower socio-economic groups may be less well-informed about the healthiness of their grocery basket. As such, the introduction of labels is more likely to provide 'new' information for these groups, compared to the higher social classes. This in turn may lead to a stronger response in terms of dietary choices. Our results confirm the latter hypothesis, suggesting that the reduction in the quantity and the improvement in the quality of labelled store-brand foods is larger for lower social class households. This is interesting from a health equality perspective. Indeed, there are concerns that nutrition labelling, amongst other informational policies, may widen health inequalities because the higher educated are better able to process information (Feng and Fox, 2018;Fox and Horowitz, 2013). We do not find evidence here to support this.
Finally, the reduction in the quantity and the improvement in the quality of store-brand labelled foods appears to be driven by the hybrid labelling system, with no evidence of any effects of the traffic light system. The latter is perhaps surprising, as some research suggests that traffic lights are easier to understand, especially as consumers spend just 4-10 s choosing each product and almost half of adults have difficulty using simple percentages (see e.g. National Heart Forum, 2007a;Which, 2006;Department for Education and Skills, 2003). Indeed, studies have found that simpler information with categorical labels such as stars or letter grades leads to better comprehension and use of information (e.g. Thorne and Egan, 2002;Weil and McMahon, 2003). However, much of this literature compares traffic lights to GDAs, rather than to a hybrid system that incorporates the percent contribution into a traffic light system. Indeed, a review by Hawley et al. (2013) suggests that, if percentages are to be used on a label, they should be accompanied by text (e.g. 'high', 'medium', or 'low') to help with interpretation. As such, hybrid labels may be seen as providing more 'personalised information' to consumers, suggesting it may be more effective (Loewenstein et al., 2014). Similarly, as the responsiveness to nutrition labels varies across socio-demographic characteristics (see e.g. Sanjari et al., 2017), another argument is that hybrid labelling combines information from the two systems. If different consumers respond to different pieces of information, it may be that the hybrid system -due to it being a combination of two pieces of information -appeals to a larger group of consumers.
Note, however, that whilst our findings that distinguish between label type may be driven by actual differences in the effectiveness of traffic lights versus hybrid labelling, it may also be the case that the types of consumers that shop at the retailers that decided to introduce traffic lights are simply different from those that shop at retailers that decided to introduce a hybrid system. Since the introduction of labelling and the choice of label were voluntary, we cannot distinguish between these different explanations.
There are several limitations of this research. First, although the data are the most highly disaggregated data available on household supermarket purchases, allowing us to explore the effect of labelling on actual purchases (as opposed to under experimental conditions), it is known that survey fatigue leads to under-reporting of top-up purchases such as bread and milk (Leicester and Oldfield, 2009). We attempt to mitigate any such fatigue-effects by focusing on the main shopping trip. However, because of this, our results may not be generalised to all purchases, particularly those 'on-the-go' and any additional top-up products.
Second, we note that FOP labelling might have affected retailer decisions. As our data capture consumer choices, we are unfortunately limited in the extent to which we can investigate such retailer behaviour. Nevertheless, our descriptive analysis explores the effects of labelling on retailers' prices, promotions, pack size, reformulation, and the introduction/withdrawal of food products. The findings suggest that retailers may have reformulated foods and brought forward the time to discontinue products to take place before labelling was introduced. Hence, the findings discussed above should be interpreted with this in mind, meaning that, in addition to picking up the effect of labelling on household demand, they may partially also be capturing the effect of retailers' (anticipatory) decisions. Furthermore, it may be that, as retailers added labels to their store-brands, they also changed its packaging. Although we find no effect of FOP labels on changes in pack sizes, there may have been changes in the presentation of products that also affect purchasing decisions, but which we are not able to capture here. Having said that, any reformulation by retailers is unlikely to fully explain the improvement in the quality of households' shopping baskets. Indeed, the retailer-analysis suggests that reformulation improved the nutritional quality of store-brand labelled as well as unlabelled foods, indicating that retailers' reformulation affected all store-brand products, rather than being restricted to store-brand labelled foods only. Although the retailer analysis shows no significant effect of labelling on the nutritional quality of store-brand cakes, the estimate is relatively large. Hence, it is possible that this mechanism also explains some of the improvement observed in the nutritional quality of cake purchases.
Finally, the policy of FOP labelling that we examine here was introduced at a time of few other food policies. 26 There was one policy, however, that overlaps in its timing with FOP labelling. In particular, in an effort to reduce excessive salt consumption, the UK government introduced a campaign in March 2005. This led to the reformulation of foods by manufacturers to reduce their salt content (Griffith et al., 2016a). However, it is unlikely that our results are confounded by this campaign. First, if the improvement in nutritional quality that we find was driven by the salt campaign rather than the introduction of FOP labelling, we would expect to find improvements across all foods, rather than those restricted to labelled foods only. Furthermore, our results suggest reductions in nutrients other than salt, including calories, fats and sugars, suggesting that the improvements in dietary quality were driven by a policy that targeted more than just salt. Second, our results are specific to store-brand foods, with no significant changes observed among branded foods, whereas the salt campaign was not brand-specific. Third, we find no evidence to suggest that the parallel trend assumption is violated, indicating no large differences in food purchases up to 12 months prior to the introduction of labelling.
Hence, we believe that this study provides evidence on the effectiveness of nutrition labelling using a relatively clean identification strategy on a large longitudinal and highly disaggregated data source. One question is whether the analysis is generalisable to today's environment. This is difficult to answer; it may be possible that individuals now pay more or less attention to nutrition labels than when they were first introduced, but the effect of this, if any, may go either way. Hence, we cannot assess whether the age of the data and the policy investigated affects the generalisability of our findings to today's nutritional environment. Instead, we encourage future research to explore this setting in more detail and to investigate whether later adoptions of nutrition labelling (and potentially their interactions with other policies such as taxes) had similar effects to those found here.
European Research Council (ERC) under ERC-2009-AdG grant agreement number 249529. Data supplied by TNS UK Ltd, trading as Kantar Worldpanel. The use of TNS UK Ltd. data in this work does not imply the endorsement of TNS UK Ltd. in relation to the interpretation or analysis of the data. All errors and omissions remain the responsibility of the authors.