Is treatment “ intensity ” associated with healthier lifestyle choices? An application of the dose response function

Healthy lifestyle choices and doctor consultations can be substitutes or complements in the health production function. In this paper we consider the relation between the number of doctor consultations and the frequency of patient physical activity. We use a novel application of the Dose-Response Function model proposed by Hirano and Imbens (2004) to deal with treatment endogeneity under the no unmeasured confounding assumption. Our application takes account of unobserved heterogeneity and uses dynamic non-linear models for the treatment and outcome variables of interest. Using seven waves of the British Household Panel Survey, we ﬁ nd that higher treatment intensity and frequency of physical activity are inversely related. We show that accounting for both treatment selection and unobserved heterogeneity halves the size of this relationship. An additional doctor consultation is associated with a 0.5 percentage point reduction in the probability of undertaking vigorous physical activity. Our results hold for a sub-sample visiting the doctor for health check-ups, and are shown to be robust using instrumental variables.


Introduction
Within the World Health Organisation (WHO) European Region, almost 77 percent of the disease burden is due to five major non-communicable diseases (NCD): diabetes, cardiovascular diseases, cancer, chronic respiratory diseases and mental disorders. Amongst its nine global targets to combat these diseases, the WHO has included a reduction of physical inactivity and tobacco consumption, and an increase in treatment and prevention of NCD by primary care doctors (World Health Organization, 2014). There is a wide range of activities that primary care doctors can undertake in treating and preventing NCD, including testing, prescribing and providing lifestyle advice to their patients.
A large literature has investigated the determinants of lifestyle behaviours and contacts with primary care doctors (see for example, Manning et al., 1991;Kenkel, 2000;Chaloupka and Warner, 2000;Cawley and Ruhm, 2011;Fernandez-Olano et al., 2006;Morris et al., 2005). Both forms of health investments have common determinants, including socio-economic and demographic factors, preferences, social networks and information. However, little is known about the interaction between these investments. Our aim is to bring together the literature on the determinants of lifestyle behaviours and healthcare utilisation by examining the association between contacts with primary care doctors and healthy lifestyle choices.
There is a substantial literature showing that health status is positively affected by the supply of doctors (see for example, Aakvik and Holmảs, 2006;Auster et al., 1969;Gravelle et al., 2008;Or et al., 2005;Robst, 2001;Robst and Graham, 1997). Evidence from the U.S., U.K., Norway and a cross-section of OECD countries shows that increasing the number of doctors per capita decreases mortality rates and improves health-related quality of life.
In a Becker-type economic framework, the effect of contacts with doctors on healthy lifestyle choices is ambiguous (Becker, 2007). Individuals invest in their health to equate marginal utility of this investment with its marginal cost. However, there is a tradeoff between current costs of healthy lifestyle behaviours (e.g. diverting time and resources away from other activities) and future increased life expectancy. In an application of this model Kaestner et al. (2014) identified two offsetting effects that are applicable to the present study.
On the one hand, there is a "competing risk of death effect" as more contacts with doctors might increase the quantity and productivity of health investments which in turn increase life expectancy and the benefit of investments in health. This leads to a positive association between contacts with doctors and healthy lifestyle choices.
On the other hand, Kaestner et al. (2014) pointed out that a "technological substitution effect" might occur if healthy lifestyle choices and contacts with doctors are substitutes in the health production function. This leads to a negative association between contacts with doctors and healthy lifestyle choices because more doctor contacts lower the marginal benefit of other health investments.
Although the direction of this association could have important implications for policies that aim to increase access to health care professionals, only one paper has explicitly investigated this empirical question. Schneider and Ulrich (2008) used two waves of the German Socio-Economic Panel Study (GSOEP) to examine the relation between a patient's health-related behaviour and the probability of visiting a doctor. Patients' health-related behaviours were measured by an indicator that took a value of one if the respondent was smoking and overweight. They used a recursive bivariate probit model with the exclusion restriction that stress directly affects patients' health-related behaviour and does not directly affect visits to the doctor. As patients who are overweight and smoke were more likely to visit the doctor, they found evidence of substitutability between visits to the doctor and healthy lifestyle choices.
Doctors can affect patients' health behaviours by providing lifestyle advice and treatment. Whilst we would expect healthy lifestyle behaviours and lifestyle advice to be either complements or independent of each other, treatment and health behaviours could be substitutes, complements or independent of each other. The only three papers investigating this relationship focused on different target populations and treatment regimens, and found mixed results. Kaestner et al. (2014) used the Framingham Heart Study spanning between 1983 and 2001 to examine the relationship between the introduction and widespread diffusion of statins and health behaviours. They found evidence that statin use is a substitute for healthy diet with a particularly large increase in female obesity (33% of the mean). They also found evidence of an increase in moderate alcohol drinking of about 15% of the mean and a decrease in sedentary activity among men. Using pooled cross-sectional data from the Health Survey for England, Fichera and Sutton (2011) found that prescription of lipid-lowering drugs complemented quitting smoking behaviour in patients with cardiovascular diseases, but smoking cessation advice was not effective in reducing smoking. Fichera et al. (2014) used a unique linkage between three waves of the English Longitudinal Study of Ageing and practice-level data on the volume of treatments delivered by doctors. They decomposed doctors' effort into an element induced by the payment system and a discretionary element, using an exogenous change in doctors' remuneration that led them to increase rates of prescription and disease control. They found that increases in the rates of disease control decreased patients' cigarette consumption.
In this paper we examine the association between the "intensity" of treatment and the level of effort that individuals exert in protecting their own health. We measure treatment intensity as the number of contacts with a primary care doctor and individuals' health behaviours as the frequency of their physical activity, their smoking and alcohol consumption in seven waves of the British Household Panel Survey. This is a new empirical application of the relation between treatment and healthy lifestyle choices as Kaestner et al. (2014), Sutton (2011) andSchneider andUlrich (2008) did not examine the intensity effect of treatment and Fichera et al. (2014) could only focus on practicelevel treatment rates.
Selection into the treatment might confound the relation between intensity of treatment and frequency of physical activity. We attempt to mitigate this problem with a novel application of the dose-response function developed by Hirano and Imbens (2004). Our methodological contribution is to develop a doseresponse function in a dynamic panel data model as follows. Firstly, we use a panel grouped count data model of visits to the doctor. Secondly, from this model we obtain the Generalised Propensity Score (GPS) to identify individuals who are predicted to have the same level of treatment but have different actual treatment levels. Finally, we estimate a dynamic random effects (RE) ordered probit outcome model of the frequency of physical activity measured at time t þ 1 ð Þ including the GPS from the treatment model and frequency of physical activity, both measured at time t. This is the first methodological application combining the continuous treatment approach with dynamic panel data models. Identification is provided by comparing individuals with different numbers of contacts with the doctor, but the same predicted "intensity" of contacts based on their personal characteristics. The dose-response function uses the GPS to capture the confounders that affect both visits to the doctor and healthy lifestyle choices. It controls for confounding by (complex functions of) observable factors but does not deal with unobserved confounding. We test the robustness of the results to this limitation using fixed effects models and instrumental variables.
The rest of the paper is structured as follows. Section 2 describes the data and the summary statistics. Details of our econometric methodology are examined in Section 3. Section 4 discusses the results. Section 5 concludes.

The British Household Panel Survey (BHPS)
The BHPS is an annual survey of each adult (16 years of age and older) member of a nationally representative sample of more than 5000 households, making a total of approximately 10,000 individual interviews.
In this survey individuals are asked "Since [last 12 months], approximately how many times have you talked to, or visited a GP or family doctor about your own health? Please do not include any visits to a hospital" with the possible answers being: none; one or two times; three to five times; six to ten times; and more than ten times. Individuals are not asked for reasons for their GP visits.
In the main analysis, we consider physical activity as the proxy for individuals' investments in their health. All individuals in the survey are asked about the frequency of their physical activity in one of a succession of questions that ask about things people do in their leisure time. As this question is asked every other year from 1996 to 2008, we select seven of the 18 waves of the BHPS. From the question: "Please [ . . . ] tell me how frequently you: Play sport or go walking or swimming?" individuals can choose any of the following: "At least once a week; At least once a month; Several times a year; Once a year or less; Never/almost never". We define physical activity in increasing level of frequency, or effort.
We also consider, as supplementary analysis, smoking and alcohol consumption. Smoking is measured as the average number of cigarettes per day and alcohol drinking is a four scale variable (from drinking at least once a week (1) to once a year or less (4)).
We consider a number of questions on individuals' ethnic and educational background, gender, age, family composition and marital status, employment, income and wealth, and geographic location. The treatment, medical consultations, and the outcome, frequency of physical activity, are measured at time t þ 1 ð Þ with respect to these covariates. All the socioeconomic characteristics are considered potential predictors of the treatment and enter the outcome regression either directly or via the GPS.
We consider a number of dummies indicating whether the respondent is white, black, Asian (Indian, Pakistani, Bangladeshi or Chinese) or other ethnic background. A set of dummy variables is included to indicate whether the respondent has obtained a university degree, a high school diploma (the U.K. A-level or Olevel), the Higher National Diploma, a semi-professional qualification in the U.K or no qualification at all. We consider the number of children under the age of two years in the household. We dichotomise employment status to indicate whether the respondent is employed (either be employee or self-employed) as opposed to retired, unemployed, on maternity leave or on other employment status. We have taken the natural logarithm of the equivalised value of household monthly income and deflated it by the consumer price index with 1995 as base year. We also considered the number of rooms in the house as an indicator of wealth. A set of dummy variables is included indicating the geographical region of the UK in which the respondent lives.
Respondents are asked to identify the physical health problems and disabilities they are currently suffering from a list of 15 physical health conditions. We group these conditions in an homogenous set of eight dummies: musculoskeletal (e.g. arms, legs, feet and back problems); cardiovascular diseases (e.g. heart problems and high blood pressure); diabetes; skin, head or sight problems; respiratory problems; stomach problems; depression and other conditions. In addition to the type of conditions, we construct a series of five dummies indicating the number of health conditions between 0 and two, three, four, five and over six. In order to mitigate reverse causality both the number and type of health conditions are measured at the first wave in which individuals are interviewed.
In supplementary analyses we focus on a subsample of individuals who have visited the doctor at least once for preventive purposes. This is intended to alleviate concerns that unobserved health conditions affect both the propensity to engage in physical activity and doctor visits. Unfortunately, there is no longitudinal data in the UK that asks patients for the reasons why they have visited the doctor. The English Longitudinal Study of Ageing, like the BHPS, reports the type of conditions diagnosed by the doctor and whether the respondent has visited the doctor, but does not contain the number of doctor visits or the reason for visiting the doctor. The Health Survey for England, a cross-sectional survey held since 1991, asks for the number of doctor visits but not the reason for visiting the doctor. Therefore, we restrict one of our supplementary analyses to BHPS respondents who have undergone at least one health check in the last year.
In a series of questions about which preventive health checkups respondents have undertaken, we select the National Health Service (NHS) check-ups that are most likely done in a primary care practice. These are blood pressure measurement, cholesterol measurement, cervical screening, breast screening and blood tests. In the supplementary analysis, we restrict the sample to individuals who reported having had at least one of these checkups in the last year. Not all of the visits in this sample would have been for preventive purposes, but this supplementary analysis is focused on a sub-set of the full sample for which a greater proportion of their visits were for preventive purposes. Nearly 2000 individuals per year (about 94% of the sample) reported having one of these tests, leading to a combined sample of 11,736 observations. For our instrumental variables analyses we generate two instruments: i) the number of times that the individual's spouse has visited to a doctor; and ii) the average number of consultations with the doctor in the individual's Local Authority District (LAD) of residence.

Summary statistics
In the main analysis we consider the population aged between 30 and 59 years because their need for medical consultations and their health effort is expected to differ substantially from the older population. Summary statistics are reported in Table 1. More than a quarter of this population group have not been to the doctor in the past year. Approximately 37% of respondents visited the doctor once or twice a year and 19% went to the doctor between three and five times a year. More frequent visits to the doctor are rarer, with almost 9% of people going to the doctor six to ten times a year and about 8% of people going more than ten times a year.
Whilst 48% of the sample reported playing sport, walking or swimming at least once a week, 26% of people reported that they never or almost never undertook these forms of physical activity. About 12% of people do these forms of physical activity at least once a month and 9% several times a year.
On average this sample has an equivalised household monthly income of £679 and lives in a house containing five rooms. About 78% of the population is either an employee or selfemployed. Approximately 16% of people report having at least a university degree and 51% report to have obtained a high school diploma.
The population aged between 30 and 59 years is relatively healthy, with 93% having at most two health conditions and only 2% of people reporting having six conditions or more. About 16% of people report having a type of musculoskeletal problem and 17% report skin, head or sight problems.

Econometric methodology
Our empirical strategy has two main features. Firstly, we predict the propensity score from a (panel) grouped count data model to account for selection of the intensity of treatment, as the number of visits to the doctor depends on individuals' previous behaviour and socioeconomic characteristics. Secondly, we also account for non-linearities and persistency in the effort that individuals exert on their health investments with panel data ordered probit models.
Matching methods have been widely used in the programme evaluation literature of the last two decades (see Augurkzy and Kluve, 2007 for an overview). This is largely due to their ability to mimic experimental settings ex post. As many observational studies involve non-binary treatments, recent literature has extended propensity score methods to the cases of multi-valued treatments (Imbens, 2000;Lechner, 2001), and, more recently, continuous treatments (Behrman et al., 2004;Hirano and Imbens, 2004;Imai and van Dyk, 2004). Hirano and Imbens (2004) apply a generalisation of the binary treatment propensity score, namely the generalised propensity score (GPS), to a population of individuals winning the Megabucks lottery in Massachusetts in the mid-1980. They estimate a dose-response function (DRF) for the amount of lottery prize wins on subsequent labour earnings using the propensity score to adjust for differences in pre-treatment characteristics.
In this section, we build on the approach developed by Hirano and Imbens (2004). As in Hirano and Imbens (2004) application, the "intensity" of treatment depends on pre-treatment characteristics. We therefore compare individuals with similar pretreatment characteristics and similar GPS, i.e. predicted levels of treatment, but different actual treatment levels.
We contribute to the original study by Hirano and Imbens (2004) by developing a dose-response function in a dynamic panel data model as follows. Firstly, we use a (panel) grouped count data regression to model visits to the doctor. Secondly, we obtain the GPS from a panel model conditional on a variety of socioeconomic characteristics. Finally, we estimate a panel random effects (RE) ordered probit model of the outcome measured at time t þ 1 ð Þ including the GPS from the treatment model and the value of the outcome measured at time t.

Implementation and estimation
Let us define N as a random sample for the population of private households in the UK. For each individual i in this sample we observe a set of pre-treatment covariates (i.e. measured at time t before the treatment). These are the X it or the X i0 variables described in Section 2.
A number of econometric methodologies have been used to model utilisation of primary care with the majority of these studies adopting binary and count data models (see for example, Cameron andTrivedi, 1986, 1993;Cameron and Windmeijer, 1996;Jones, 2000;Sarma and Simpson, 2005). Count data models allow the distribution of doctors' visits to be skewed and restrict its predicted values to be non-negative. As we only observe intervals of visits, we base our estimation on a generalisation of count data models to grouped data (see for example, Moffatt and Peters, 2000;Teckle and Sutton, 2008;Brown et al., 2014). In order to account for grouping we modify the log-likelihood function as follows: where T i tþ1 ð Þ indicates the number of visits to the doctor which is forward looking with respect to the covariates and I 1 ; . . . ; I J indicates each of the five groups of visits to the doctor with each group containing a set of consecutive integers a j ; a jþ1 ; . . . a jþk È É ; the indicator d ij tþ1 ð Þ takes the value one if t i tþ1 ð Þ 2 I j and zero otherwise; and P T i tþ1 ð Þ 2 I j À Á is the probability that individual i reports group j modelled with a Poisson distribution 1 : Þ where we assume the distribution of c i to be normal. We follow Wooldridge (2005) and indicates the initial condition of each realised value of the visits to the doctor and h i is the new unobserved time-invariant effect assumed to be normally distributed. The Chamberlain (1984) and Wooldridge (2005) approach allows us to account for potential endogeneity of the initial conditions by including individuals' background characteristics over the entire observed period. 2 X i is the average of the timevarying pre-treatment characteristics described in Section 2. As suggested by Cameron and Trivedi (2015) this can be simply estimated by entering X i as additional regressor. The choice of covariates depends on the behavioural factors that affect healthcare utilisation. Education, a proxy for human capital, is related to both health knowledge and self-management (Goldman and Smith, 2002;Cutler and Lleras-Muney, 2006). Income and employment status are proxies for the opportunity cost of time of visiting the doctor. As doctor consultations do not attract user charges in the UK, this is the only cost to the individual. There is also some evidence that ethnicity affects the utilisation of primary care in England (Goddard, 2008). Finally, we consider the number and type of health conditions measured in each individual's first year of observation. As there might be interactions between these characteristics, we include interactions between types of con-1 Brown et al. (2014) used a zero-inflated model. Teckle and Sutton (2008) compare Poisson and negative binomial models and show the results are not very different. As we are not actually interested in the treatment model, but we only use it to predict the propensity score, we use a simple Poisson model. ditions, age and education. This categorisation of these pretreatment variables achieves good balance between the treatment and comparison groups. Each individual i's prediction from this regression is constrained to lie within the observed group I j to obtain the "predicted" medical consultations T i tþ1 ð Þ ¼l i tþ1 ð Þ . In other words, let the upper and lower bound of each interval group I j be u and l, respectively. The "predicted" medical consultations T i tþ1 ð Þ is calculated as: ð Þ including the time invariant component c i . We then define the value of the outcome variable, the frequency of physical activity, associated with these medical consultations as Y i tþ1 ð Þ . For each individual i there exists a set of potential outcomes Hirano and Imbens (2004) as the unit-level dose-response function. Whilst in the The propensity function is defined by Hirano and Imbens (2004) as the conditional density of the actual treatment given the observed covariates, that is, r t; x ð Þ ¼ f TjX ðtjxÞ. The GPS is then defined as R ¼ r T; X ð Þ. The GPS has a balancing property similar to that of the binary treatment propensity score, that is, within strata with the same value of r t; x ð Þ, the probability that T ¼ t does not depend on the value of X. More formally, let D t ð Þ ¼ 1: Hirano and Imbens (2004) show that, in combination with a suitable unconfoundedness assumption, this balancing property implies that assignment to treatment is also unconfounded, given the GPS. In this case, a generalisation of the unconfoundedness assumption for binary treatment made by Rosenbaum and Rubin (1983) states that assignment to treatment is weakly uncounfounded for a given set of pre-treatment covariates, that is, Y t ð Þ ? D t ð ÞjXforallt 2 t. The weak unconfoundedness hypothesis is based on pairwise independence of the treatment with each potential outcome. It also requires Y t ð Þ and the treatment to be "locally" independent at the treatment level of interest T ¼ t, not T (Imbens, 2000). This is a key point of our identification strategy as it assumes no unobserved confounding between our outcome and treatment of interest.
The estimation of the dose-response function (DRF) consists of three stages, which will now be explained in turn.

First stagetreatment model and the balancing test
In the first stage, we estimate the time-varying score We modify the linear model used by Bia and Mattei (2008) and estimate the score from the same dynamic correlated random effects grouped Poisson model in (1), but the predicted score is not constrained to be within the bounds of the group in which the individual originally reported. Whereas T i tþ1 ð Þ represents the best prediction conditional on pre-treatment covariates of the actual visits to the doctor within each reported group, the conditional expectation of the treatment used to retrieve the GPS represents the visits to the doctor an individual would be expected to make given her pre-treatment characteristics, which may lie outside the bounds of the group in which the actual treatment level lies.
From the grouped count data model we estimate the GPS, (2). We follow Hirano and Imbens (2004) and balance the covariates blocking on both the treatment variable, the number of visits to the doctor T i tþ1 ð Þ À Á , and on the estimated GPS. We implement this procedure by first dividing the sample into three cuts according to the distribution of the actual treatment, namely, those who never visit the doctor, those who visit the doctor between one and five times, and six times or more a year. As suggested by Hirano and Imbens (2004) we chose the cut-off points to fit the distribution of visits.
Within each cut, we compute the GPS from Eq.
(2) at the median of each cut of the treatment. Then, we divide each cut into blocks defined by tertiles of the GPS evaluated at the median, considering only the GPS distribution of individuals in that particular cut of medical consultations. Within each block we calculate the mean difference of each covariate between individuals who belong to a block of the cut and those who belong to other cuts. We combine all the mean differences by using a weighted average with weights given by the number of observations in each tertile of the GPS. This procedure is repeated for each of the cuts and for each pretreatment characteristic. The key assumption of the first stage is that, conditional on the GPS, there are no statistically significant differences between the characteristics of individuals belonging to different treatment intervals. This does not necessarily imply that there are no differences in their unobserved characteristics.

Second stagethe outcome model
The second stage is to model the conditional expectation of the frequency of physical activity Y i tþ1 ð Þ , given the visits to the doctor T it , and the GPS,R it , as follows: where we include a second-order polynomial of the GPS. Eq.
(2) is estimated as a dynamic random effects ordered probit model with Y Ã i tþ1 ð Þ being the latent propensity to put effort on physical activity, h i is the individual-specific, time-invariant random component capturing the unobserved heterogeneity of individual i and e i tþ1 ð Þ is the time-variant error term. We use a dynamic random effects ordered probit model to alleviate concerns about unobserved confounding. However, we note that this involves assumptions of strict exogeneity and orthogonality between h i and the regressors.
The observed measure of frequency of physical activity is related to Y Ã i tþ1 ð Þ as follows: where individual i reports the mth frequency of physical activity, with M being the healthiest option (i.e. M = 5 indicates playing sport, walking or swimming at least once per week). If the underlying latent propensity Y Ã i tþ1 ð Þ is between d mÀ1 and d m , the realised value of frequency of physical activity is m ¼ respectively, the frequency of physical activity prior to Y i tþ1 ð Þ and the initial condition of each realised value of the frequency of physical activity.
We calculate the average partial effects of the visits to the doctor T it for each realised outcome as APE T it ð Þ ¼

Third stagethe DRF plot
The last stage consists of estimating the DRF at each level of the treatment as follows: This procedure averages over the score evaluated at the treatment level of interest r t; X ð Þ. Hirano and Imbens (2004)  i , compared to the other treatment level t'. This is based on the maintained assumption that there is no unobserved confounding between the outcome and treatment of interest.

Supplementary analyses
We also undertake a range of supplementary analyses. Some of these check the robustness of the results to the model specification; others focus on an alternative sample of individuals who have visited the doctor at least once for preventive purposes and implement alternative econometric techniques.

Alternative model specifications
We modify the main model specification in four ways. First, we modify the RE ordered probit model to include all the covariates used to estimate the GPS directly in the outcome model. Second, we modify the definition of treatment in the outcome models to include the number of doctor visits: i) treated as count variable; ii) with a set of dummies for each interval of visits; and iii) the prediction from the interval regression rounded to the closest integer. Third, we adopt the stratification method suggested by Imai and van Dyk (2004) by estimating the outcome model separately for each tertile of the GPS and then take the weighted average of the coefficients. Finally, we repeat the analysis described in Section 3.1 including the past frequency of physical activity in the treatment model. In this analysis we lose one year of data for the outcome model.
Other specifications provide wider evidence to support the plausibility of the findings. We repeat the analysis for two other health behaviours, smoking and alcohol drinking. We consider the same years used for estimating the physical activity outcome models (i.e. every other year between 1996 and 2008) and focus only on those participating at some level in these behaviours in the first year of observation. We estimate treatment and outcome models for the older population aged 60 and over.

Refinement of treatment
There is a concern that reverse causation might still bias our results even after controlling for past physical activity, carefully selecting the timing between outcome and treatment, and analysing other health behaviours. We attempt to address this concern by selecting a sub-sample of individuals who we know have visited a doctor for preventive activity by undertaking at least one check-up. There is no other micro-level longitudinal data in the UK that asks patients the reason for visiting the doctor. So whilst this is only an attempt to check the robustness of our estimates for a sub-sample of BHPS respondents who access preventative health services, we acknowledge that they might have visited the doctor for other reasons as well. We have not found any recent aggregate figures for the whole of the UK reporting summary statistics on the reasons why people visit the doctor. However, we found that a 2013 study by the Information Service Division in Scotland reported that amongst the ten activities that attract most of the consultations in primary care practices are blood testing (500 consultations per 1000 population), blood pressure monitoring (350 consultations per 1000 population) and general diagnostic tests (210 consultations per 1000 population). Prescription or medication review account for about 60 consultations per 1000 population. This might suggest that most consultations are for preventive or monitoring purposes. Using Kaestner et al. (2014) theoretical model, for this group of people we can think of visits to the doctor and healthy lifestyle choices as two preventive activities in their health production function. A "technological substitution effect" might prevail if doctor contacts lower the marginal benefit of other health investments. For the sub-sample of people who have at least one check-up we re-estimate a RE ordered probit model including all the covariates in the treatment model.

Alternative econometrics techniques
An additional major concern is that dose response models cannot deal with unobserved confounding. The balancing test conditional on the GPS shows that individuals are similar in a wide range of observable characteristics, but it is still possible that they differ in some unobserved components that we have not controlled for. For instance, individuals might differ in their propensity to visit the doctor and this propensity might affect their propensity to engage in physical activity. Additionally, the RE models assume that the unobserved component h i is uncorrelated with the regressors.
We address this concern in two ways. First, we estimate a linear fixed effects model under the assumption that unobserved heterogeneity is time invariant. Fixed effects models have the advantage to relax the RE assumption of orthogonality between h i and regressors. Secondly, we use two separate instruments in a two-stage residual inclusion (2SRI) model, an alternative implementation of the two-stage least squares model that is consistent in non-linear models (see for example, Terza et al., 2008). 2SRI models have been recommended for the estimation of count data models (see Wooldridge, 1997Wooldridge, , 2002. The first stage is identical to Eq. (1) with the exception of the inclusion of two instruments in two separate models: i) the number of doctor consultations by the individual's spouse, denoted by T Àit ; or ii) the average number of consultations in the individual's Local Authority District (LAD) of residence. These are relevant instruments because they are correlated with doctor visits in the first stage model. However, we note that both instruments have advantages and disadvantages. On the one hand, the estimation sample when using spousal consultations with the doctor is restricted to married or cohabiting respondents for whom we have information on their spouses (about 67% of our main sample of individuals). On the other hand, there is more variation in using individual i's spouse than her location. There is however a fair amount of variation in the LAD-measure of doctor consultation with a standard deviation of about one (on the 0-4 scale). Finally, whilst one might be concerned that spousal visits to the doctor might affect individual i's engagement in physical activity for the sharing of time resources (e.g. if i had to take his/her spouse to the doctor), it is unlikely that the LAD average of doctor consultations directly affects her own engagement in physical activity. It should be a valid instrument and alleviate concerns of reverse causality and omitted variable bias because the average LAD medical consultations should be uncorrelated with individual i's unobserved determinants of physical activity.
The second-stage is a random effects ordered probit model of physical activity: where Y Ã i tþ1 ð Þ indicates the propensity to engage in some level of physical activity with realised value as defined in Eq. (4).û i tþ1 ð Þ is the error term predicted in the first stage. We have bootstrapped the standard errors with 1000 replications. When the average LAD medical consultations are used as instrument, we cluster standard errors by the LAD. 3 For comparability with our previous estimates, we have estimated Eq. (5) on the same sample used for Eq. (3). T it indicates the number of doctor visits as defined in the previous models. A large difference between the coefficients of doctor visits, T it , instrumented by either measure could indicate whether any of the above mentioned caveats are a serious concern for our identification strategy.

Main analyses
On the left panel of Fig. 1 we display the histogram of T it , the number of visits to the doctor predicted from the constrained correlated random effects dynamic grouped Poisson model in Eq. (1). 4 The spikes indicate the lowest bounds of the original treatment group to which we attribute predicted values that lie outside the reported group. On the right panel of Fig. 1 we display the histogram of the predicted visits to the doctor from the unconstrained treatment model described in Subsection 3.1.1. The difference between the two distributions indicates there are individuals who reported a level of treatment intensity within a given interval but would not have been predicted to do so based on their demographic, initial health and socio-economic characteristics.
We report the results of each of the three stages for estimating the DRF, namely, the treatment model and the balancing test, the outcome model and the DRF plot.
The results of the treatment model are reported in Table 2. We briefly comment on it as it is only used to estimate the GPS. We find that people with higher income are more likely to visit the doctor indicating the income effect prevails over the substitution effect.
Visits to the doctor measured at time t þ 1 ð Þ are an increasing function of visits measured at time t indicating a strong state dependence. Most initial health conditions have a positive association with visits to the doctor except CVD. As the interaction term of CVD with university degree 5 is positive and statistically significant, the coefficient of the initial CVD condition refers to those without a university degree. A recent paper by Labeit et al. (2013), using the same data as the present study and dynamic panel data probit models, found that people with lower education are less likely to uptake cholesterol tests and blood pressure checks. This might confirm our result that there are educational differences in healthcare utilisation particularly for those with cardiovascular conditions. Tables 3 and 4 report the balancing tests for each of the three cuts of the treatment, unadjusted and adjusted for the GPS, respectively. We report a more conservative significance value at the one percent level because of multiple comparisons over 70 variables. Table 3 shows a high level of statistical imbalance in most of the pre-treatment covariates for each of the cuts. Imbalance is especially high when considering socio-economic characteristics, and the initial types and number of health conditions and visits. This indicates that BHPS respondents who 3 Because individuals move between LADs, panels are not nested within clusters.
Therefore we clustered standard errors by the LAD individuals were first observed to live in. 4 We do not report the results of Eq. (1) because they are the same as the treatment model reported in Table 2. 5 All interactions have been omitted from the table in the interest of space. visit the doctor more or less frequently differ in their observed characteristics. In Table 4 we show that after adjusting for the GPS we obtain a very good balance for all of the pre-treatment characteristics. This indicates that conditional on the GPS, BHPS respondents visiting the doctor more or less often are similar to each other. However, we cannot assert that they have similar unobserved characteristics. Table 5 reports the coefficients of the outcome models. We first compare the random effects dynamic ordered probit model (Model IV) to the pooled and static model (Models I-III) and then we discuss the size of these effects in Table 6. Each model includes visits to the doctor measured at time t, but whilst model (I) omits the GPS, all the other models include it. Model (IV) additionally includes frequency of physical activity measured at time t and its initial conditions. Each of these models makes a different assumption about unobserved heterogeneity with the dynamic RE model being more complex by allowing for state dependence. Ex-ante we expect the size of the relationship between treatment "intensity" and physical activity to be smaller as we account for unobserved heterogeneity and state dependence.
The statistical significance of the GPS in model (II) indicates that in model (I) there were omitted factors affecting both the higher propensity to visit the doctor and frequency to do physical activity. There is evidence of non-linear effects of the GPS as its higher order terms significantly affect the frequency of physical activity. The size of the GPS is smaller when unobserved heterogeneity is accounted for in Models (III) and reduces by almost half when both unobserved heterogeneity and state dependence is accounted for in the dynamic model (IV). Physical activity measured at time t is positively associated with frequency of physical activity measured at time t þ 1 ð Þ as compared to no activity. This association is stronger the more frequently the respondent played sport, walked or swam at time t, indicating very strong state dependence.

Supplementary analyses
In Table A1 we report estimates when including all the covariates from the GPS regression directly in the outcome equation. The association between physical activity and number of visits to the doctor remains negative and statistically significant and the size of the coefficient is quite close to the one in Model (IV) of Table 5. Including the GPS in the outcome model reduces the curse of dimensionality and improves efficiency because we can just use a term (i.e. the GPS) predicted from a treatment model that includes interactions between covariates and polynomial terms instead of individual covariates.
As shown in Fig. 1 doctor visits have been predicted from a constrained model where only the bounds of the intervals map the actual visits. There might be a concern that this prediction is driving our results. In Table A2 we show this is not the case as the same negative association between visits to the doctor and physical activity holds when treatment is defined as a count variable (Model I), as a set of dummy variables (Model II) or as a prediction from the constrained regression rounded to the closest integer (Model III). Model II shows that the association between doctor visits and physical activity is steeper at higher intensity levels as reported in the DRF plot. Table A3 shows that the relationship between treatment intensity and physical activity is similar when we estimate the outcome model separately by GPS tertiles. In Table A4 we report the results of the outcome model where GPS has been obtained from a treatment model that includes physical activity. We show that our previous results were not driven by the omission of past physical activity from the treatment model as the coefficient of interest has a similar size and statistical significance. This should alleviate concerns about reverse causality.
We find a statistically significant relationship between smoking or drinking and number of visits to the doctor. Table A5 indicates that more frequent visits to the doctor are associated with more drinking and smoking, a similar association to the one found for physical activity.
In Table A6 we report alternative specifications to alleviate concerns of reverse causality and unobserved confounding. All models show that there is still a negative association between doctor visits and physical activity. We report in Table A6 the set of covariates that is shared across all models. Model (I) is a RE ordered probit model estimated on the sample of those who undertake at least one health check. The size of the coefficient is very similar to the one in our preferred specification of a dynamic RE model with the GPS in Table 5. We have also estimated Model (I) using all the specifications in Table 5 and results are very similar (available from the authors on request). Model (II) is estimated with a linear FE model where only time varying covariates have been included. The coefficient on doctor visits is also very similar to our previous specifications.
Models (III-IV) report the second stage RE ordered probit coefficients of the 2SRI specification described in Eq. (5). The negative association between number of visits to the doctor and physical activity holds when using either spousal visits to the doctor (Model III) or area average visits (Model IV) as instruments. The similarity of the coefficients on doctor visits using these alternative instruments goes some way to alleviate concerns over their limitations. The coefficient on the residuals predicted from the first stage regression is statistically significant indicating endogeneity of visits to the doctor (Terza et al., 2008). It can be interpreted as evidence that those who have a higher propensity to go to the doctor have a lower propensity to engage in physical   Coefficients of ordered probit models for physical activity measured at t þ 1 observations. ***p < 0.01; **p < 0.05; *p < 0.1.

Table 6
Marginal effects of no. visits to the doctor from random effects ordered probit models. activity. As the first stage estimates of these models are similar to those reported in Table 2, we do not report them here, but they are available on request. Both instruments are relevant instruments as they are positively and statistically significantly (at the one percent level) associated with individual's i visits to the doctor (with a coefficient of 0.02 and 0.14, respectively; and a value of the z statistics greater than 10). In Table A7 we show that the negative association between visits to the doctor and physical activity holds for the older population as well. The magnitude of this relation is higher than the one found for the sample aged between 30 and 59 years.

Conclusions
Healthy lifestyle choices and medical consultations can be substitute or complements in the health production function. Although previous literature (Kaestner et al., 2014;Schneider and Ulrich, 2008) has found evidence of substitutability, medical treatment was measured as a dichotomous variable in these applications. In this paper we have examined the effect of increasing treatment "intensity", the number of doctor contacts, on frequency of physical activity using seven waves of the BHPS.
We have found evidence of a negative association between treatment intensity and physical activity. This relationship is stronger the higher the intensity of treatment. An additional medical consultation is associated with a reduction in the probability of engaging in physical activity at least once a week by 0.5 percentage points while the probability of not doing any physical activity at all increases by 0.4 percentage points. This association is related to a shift of the distribution of physical activity to the left towards lower frequency of engagement. The changes in moderate physical activity are smaller as, on average, an additional medical consultation is associated with an increase in the probability to engage in physical activity between once a year and once a month by about 0.05 percentage points.
We have also shown that a simple regression of the number of visits to the doctor on the frequency of physical activity suffers from selection bias and over-estimates the relation between medical consultations and investments in health. We have attempted to mitigate this selection bias problem with a novel application of the dose-response function developed by Hirano and Imbens (2004) that combines the continuous treatment approach with dynamic panel data models.
Our novel methodological application has produced three insights in the modelling of the relation between treatment intensity and healthy lifestyle choices. Firstly, we have shown that selection bias accounts for part of the relation between treatment intensity and healthy lifestyle choices as there is a 14% reduction of the coefficient of treatment intensity when the generalised propensity score (GPS) is included in the regression. Secondly, the dose-response function with the GPS could lead to efficiency gains as it allows confounders to enter flexibly in the outcome model via the GPS that can then be stratified and modelled with higher polynomial orders. Our results suggest that accounting for non-linearities in the characteristics determining treatment selection is important as the second-order polynomial of the GPS is statistically significant in the outcome regression. Finally, combining dynamic models and a dose-response function with the GPS has the advantage to flexibly account for treatment selection, unobserved heterogeneity and the dynamic nature of healthy lifestyle choices. We have found the size of selection bias is lower as there is an almost 42% reduction in the coefficient of treatment intensity when we estimate the dose-response function in a dynamic random effects model (i.e. including both the GPS and the lagged values of the outcome variable).
One limitation of our paper is that the measure of frequency of physical activity is only limited to playing sports, swimming or walking. Whilst this is the only type of physical activity that is consistently measured across the BHPS sample, we have shown evidence of a negative association between treatment intensity and other healthy lifestyle behaviours such as reducing cigarettes and alcohol consumption.
A second limitation which we share with the study by Hirano and Imbens (2004) is the lack of exogenous variation in treatment. The application by Hirano and Imbens (2004) focused on a crosssection of lottery winnings which although exogenous belong to a particular selected sample of players. Although we combine dynamic panel data models with the GPS, we are cautious in asserting we are estimating a causal effect. We have attempted to mitigate this limitation by using 2SRI models with spousal and area-average visits to the doctor as instruments. Both instruments have advantages and disadvantages relating to the amount of variation and the potential for direct pathways to physical activity. Under the assumption of time-invariant unobserved heterogeneity we have estimated a FE model. Although our results are robust to both specifications, we note that timevarying unobserved heterogeneity and omitted variable bias might still be possible.
A third limitation is that, in following Bia and Mattei (2008), Hirano and Imbens (2004) and Imbens (2000), we do not correct the standard errors for the inclusion of the GPS in the outcome model.
A final limitation is that we cannot determine what elements of the treatment generate an inverse relationship with healthy lifestyle choices, as the dataset contains no information on the cause and content of doctor consultations. The non-linear association between doctor visits and frequency of physical activity might be concerning if an unobserved (to us as researchers) health problem has induced patients to initiate a doctor visit. This would generate a non-linear and reverse causal association between doctor visits and frequency of physical activity. There is no longitudinal survey data in the UK that asks respondents the reason for visiting the doctor. Instead, we have shown that our results are robust to restricting our sample to individuals who have had at least one preventative health check-up. Official statistics suggest that the majority of doctor consultations are for preventive purposes. These two pieces of information point to the direction of a substitution between two preventive activities in the health production function. However, we highlight that data do not allow us to ascertain the reason for visiting the doctor and therefore we cannot give a causal interpretation to the estimates produced in this study. Interactions between health conditions, age and education not displayed. ***p < 0.01; **p < 0.05; *p < 0.1.

Table A4
Coefficients of ordered probit models for physical activity measured at t þ 1 ð Þ using past physical activity in the treatment model. observations. GPS has been predicted from a treatment model that includes lagged physical activity. ***p < 0.01; **p < 0.05; *p < 0.1.

Table A6
Coefficients of physical activity measured at t þ 1 ð Þ with alternative econometric models.