Hedonic methods for baskets of goods ✩

• We apply hedonic price methods to large complex baskets of goods. • We combine hedonic price methods with revealed preference. • We estimate on willingness-to-pay for organic using scanner data with thousands of goods. Existing hedonic methods cannot be easily adapted to estimate willingness to pay for product character- istics when willingness to pay depends on a very large basket of goods. We show how to marry these methods with revealed preference arguments to estimate bounds on willingness to pay using data on purchases of seemingly impossibly high dimensional baskets of goods. This allows us to use observed purchase prices and quantities on a large basket of products to learn about individual household’s will- ingness to pay for characteristics, while maintaining a high degree of flexibility and also avoiding the biases that arise from inappropriate aggregation. We illustrate the approach using scanner data on food purchases to estimate bounds on willingness to pay for the organic characteristic. © 2013 The Authors. Published by Elsevier B.V. All rights reserved.


Introduction
Researchers, policy makers and firms often want to estimate consumers' willingness to pay for a characteristic of a good. For example, there is much interest in estimating willingness to pay for organic products (for example, see Blow et al., 2008). For small scale problems, hedonic or discrete choice methods can provide estimates. However, these methods are not tractable when the number of relevant products is large or the characteristic space is large.
We consider the (common) situation in which a consumer buys a large basket of goods, each good having many characteristics. We propose a method to marry hedonic price methods to revealed preference methods for analysing these large and complex baskets of goods.
It has long been understood that analogues of classic revealed preference arguments apply to hedonic prices (see for example Scotchmer (1985), Kanemoto (1988), Pollak (1989), and Pakes (2003)). These papers show that hedonic prices can be used to bound willingness to pay and willingness to accept. We build on Scotchmer (1985) and Pollak (1989) to develop the argument when consumers buy a basket of goods. The idea is simple. The fact that a consumer paid some premium to purchase a basket of goods implies that the consumer must have been willing to pay at least as much as that premium.
We combine ideas from the hedonic pricing literature (Nesheim, 2008;Bajari and Benkard, 2005) with revealed preference ideas from Blow et al. (2008) to analyse willingness to pay when consumers purchase continuous quantities of a high dimensional basket of goods. A major benefit of our approach is that we can exploit rich data without introducing aggregation bias and without making unnecessary separability assumptions. Under very minimal assumptions we are able to estimate bounds on willingness to pay; with more restrictive assumptions (but ones that are common in the literature) we can obtain point estimates of households' willingness to pay.
We illustrate our approach by estimating bounds on willingness to pay for organic foods using data on the shopping baskets of a large number of households. These estimates can inform regulation over the licencing and labelling of organic foods, increase government knowledge about consumer valuations of agricultural and environmental policies, and help give firms a better understanding of the potential profitability of new product lines.

Theoretical background
To develop intuition, we first describe bounds on willingness to pay in the single product case. Then we extend the analysis to the choice of a basket of products.

Demand for a single product
Let z ∈ Z ⊆ R n be the vector of all product characteristics that affect consumer choice. Let z (1), the first coordinate of z, be the characteristic of interest. In our example, z(1) = 1 if a product is organic and z(1) = 0 otherwise. The product price is given by Consider a consumer with characteristics x h who buys a single unit of an organic product with product characteristics z o and price p o and elects not to buy a non-organic product with characteristics z n and price p n . Assume that the two products are identical in all dimensions other than organic. Let the consumer's indirect utility function be v (x h , z, p), where v is increasing in z(1), continuously differentiable in p and strictly decreasing in p. If the consumer chooses the organic product, then revealed preference dictates the consumer obtains weakly greater utility from the organic product. By the mean value theorem, there exists some p * The left side of this expression is the willingness to pay for the organic characteristic. The right side is the organic price premium. For all consumers who buy organic, the price premium defines a lower bound on the willingness to pay for organic. For all consumers who do not buy organic, the price premium provides an upper bound on the willingness to pay for organic.

Demand for a basket
Let B g be the set of products in category g and let B =  g B g be the set of all products. For each product b ∈ B g , let z b ∈ Z g be its vector of characteristics. Define z = {z b } b∈B as the vector of characteristics of all products.
Let v = v (x h , z, p) be the maximum utility obtainable given market prices p and product characteristics z. Each consumer chooses a vector of quantities of each product, q, to minimise costs of attaining the fixed utility level v. The consumer's total expenditure is In general, the basket purchased will include both organic and nonorganic products and the fraction organic will vary across consumers.
Denote what the consumer would have paid to obtain the same utility level if all products were converted to non-organic products with non-organic prices, b∈B are the vectors of prices and characteristics in the counterfactual world where all products are converted to non-organic varieties. For household x h , the willingness to pay for organic is the difference between these expenditures, It is the negative of compensating variation. If we assume that the utility function is known, then we can calculate a point estimate of willingness to pay using the price premia. More generally, if the utility function is not known, we cannot calculate willingness to pay. Nevertheless, revealed preference gives a lower bound, By choosing to purchase q h , the consumer has revealed that they are willing to pay at least  p − p n  · q h to purchase organic. This follows immediately from cost minimisation since We can also compute various upper bounds for willingness to pay by considering counterfactual bundles in which some nonorganic products are converted to organic. For example, let z In summary, for each consumer we can calculate lower and upper bounds on willingness to pay for organic using

Data
We use data from the 2004 Kantar Worldpanel for the UK to estimate (4) and (5). Households record purchases of all food, toiletries and cleaning products that are brought into the home using hand-held scanners. Prices are recorded from till receipts collected from the households. We use information on prices, quantities and characteristics of food items purchased for home consumption by 16,881 households. The sample contains data on more than 11 million purchases. Leicester and Oldfield (2009), Griffith and O'Connell (2009) and Griffith and Nesheim (2010) provide further description of the data.
We use data on 75 food categories where organic is a relevant characteristic. Total expenditure in our sample of households is £12.8m (grossed up using sampling weights it is £19.7bn). Summary statistics for the 75 food categories are provided in Griffith and Nesheim (2010). Overall, 2.1% of expenditure is on organic products, but there is substantial variation from 0.4% of ''Fresh Bacon Rashers'' to 28.6% of ''Chilled Meat and Vegetable Extract''. Estimating willingness to pay for organic requires the analysis of a very large basket of goods.
Organic purchases also vary across households. Just under 20% of households never buy any organic products and over a quarter buy only a very small amount (less than 0.5% of total expenditure). However, 37% of households spend more than 1% of their budget on organic products and 7% of households spend over 5% of their budget on organic products. These numbers illustrate the tremendous heterogeneity in demand for organic products, and that organic is an important expenditure category for a significant fraction of the population.

Hedonic model
To evaluate willingness to pay we estimate a hedonic price model for each of the 75 food categories. In this example we assume a linear form, but the hedonic model could take a more flexible functional form. In fact because we have very rich data on characteristics of individual products, and because most characteristics are discrete, this is in fact quite flexible. Let (b, r, s, t) index items, regions, store types and time. For each product category, we estimate a hedonic regression of the form ln (p brst ) = α 1 δ t + α 2 κ bt + α 3 φ r + f (z bs ) + ε brst (6) where δ t is the vector of month dummies, κ bt is a vector of indicators for special offers (ticket price reduction, multi-pack purchase and extra free), and φ r is a vector of regional dummies. The coefficients of interest are those on the organic coefficient, which might vary with other product characteristics, f (z bs ). The residual ε brst captures unobserved product characteristics that are mean independent of the observed characteristic.

Hedonic price estimates for milk
To illustrate the approach, we first present results for a single category, milk in Table 1. The first column includes only month and region effects along with the organic characteristic interacted with fat content. The adjusted R 2 for this regression is only 0.065. The interactions between organic and fat content are not significant (either individually or jointly)-firms in the UK do not charge differential premia on organic depending on the fat content. In columns 2 and 3, we drop these interactions and in column 3 we add in the full set of characteristics including package size and type, variety of milk, store fascia in which purchased and whether on special offer. Many of these are statistically significant, and the estimated organic premium declines significantly. The adjusted R 2 increases to 0.726. The additional characteristics explain a substantial proportion of the variation in prices. In the final column we include interactions between the organic characteristic and store fascia. Across all stores, the average price premium for organic milk is 15% and ranges from 0% at Asda to 13% at Tesco to 30% at Waitrose. Since the market share of organic milk is 2.2%, these results imply that roughly 2.2% of households have a willingness to pay a premium for organic milk of at least 15%, while 97.8% of households are willing to pay no more than 15%. However, those who buy organic milk at Waitrose reveal a lower bound on willingness to pay of 30%.

Hedonic price estimates for all food categories
We estimate 75 separate hedonic regressions of the form of (6). Each regression includes a set of characteristics that is common to all categories, as well as a set of category specific characteristics. The category specific characteristics vary in number and type. For example, there are over 200 flavours of soup and over 250 flavours of yoghurt. Eggs, on the other hand, have relatively few characteristics-whether they are barn reared or free range, eggs size and whether they are branded. The key point is that, as with milk, many of the characteristics are correlated with the organic characteristic, so failing to control for them will lead to biased estimates of the organic price premia. Complete results can be found in Griffith and Nesheim (2010).
The organic coefficients vary a great deal both across product categories and across stores. Of the 595 potential organic-fascia coefficients we are able to identify 518 (some stores never sell an organic version of some products), 462 are positive and 338 are significantly so (at the 5% level). The unweighted mean of the price premia is 0.40 (organic products are roughly 40% more expensive) and the median is 0.38. Asda and Safeway have low prices on average and have the smallest mean and median price premia as well as the most categories (8) with price premia less than or equal to zero. The other stores have higher average and median organic price premia, fewer categories (4 or 5) with non-positive coefficients. In all cases, the range of positive price premia is from zero to nearly 125%. Marks and Spencer has the highest mean and median markup, followed closely by Sainsbury's, Waitrose and Tesco.
The adjusted R 2 are high (with a few exceptions) suggesting that we have captured most of the product characteristics that affect pricing. We have detailed information on all product characteristics judged to be important by market research firms, including characteristics that vary over time (such as being on special offer) and space (such as being sold in a different store). As indicated by the adjusted R 2 , measured characteristics explain most of the variation of prices in our data.

Bounds on willingness to pay for baskets
We aggregate these individual price premia to obtain estimates of (4) and (5) for each household. The dimensions of the vectors p and q h are each over 4 million (47,854 barcodes by 12 months by 7 stores). Each element of  p − p n  is computed using our estimated hedonic coefficients from (6). Households spend different amounts on food, so we present the estimates as a share of expenditure For a very small number of households (18) we estimate a negative lower bound on their willingness to pay for organic. For a much larger number (4121 (22.7%)) we find that they have zero willingness to pay for organic, either because they purchase no organic products, or the net price premia across all products purchased cancels out. The median lower bound on the willingness to pay is 0.2%, with just over 12.5% having a lower bound on their willingness to pay that is 1% or more. The median upper bound is 31.5%, with most households having a value between 20% and 40%.
The figures show a contour plot of the joint distribution of (7) and (8) across households. The first figure includes all households, the second figure focuses in on that part of the plot where most households are located. From the figures, one can see that the largest mass of households is located in the region where the lower bound ranges from zero to 0.2% of expenditure and the upper bound ranges from 35% to 45%.

Summary and conclusions
Rich data on spending behaviour are now widely available in a number of countries. These data offer great potential to learn about willingness to pay for many different characteristics. However, their use has been in part thwarted by the sheer scale of the data. Existing revealed preference approaches to estimating willingness to pay cannot deal with the large dimensionality of these data.
Methods such as Blow et al. (2008) illustrate how assumptions about separability and the absence of time varying demand shocks, combined with panel data, can be used to obtain point estimates of willingness to pay, at least for a fraction of the population. We extend the ideas developed in Blow et al. (2008) by incorporating market pricing equilibrium conditions, which help to reduce the dimensionality of the problem, but allow us to retain much of the flexibility of their approach. We use standard assumptions about market pricing equilibrium and consumer revealed preference behaviour to compute consumer specific bounds on willingness to pay. We show how to aggregate estimates of willingness to pay for individual products across a basket of products in a manner that is consistent with consumer theory. These bounds are Laspeyres style price indexes for differentiated products. In order to recover point estimates of the willingness to pay we would need to make further assumptions about the structure of consumers' preferences. While this is certainly feasible for individual product categories, further work needs to be done to develop a tractable method to analyse the entire food basket.
We illustrate the application of these methods using rich data on households' purchases of food to estimate lower and upper bounds on willingness to pay for the organic characteristic in food. Our results suggest that there is a large amount of heterogeneity in willingness to pay for organic products.