Estimating the health value added by nursing homes

Measuring performance in healthcare remains a challenge. The use of health outcomes rather than structure and process indicators is considered as the way forward, but outcome-based results risk being biased by selection. Accounting for such selection bias is more diﬃcult in settings with small-sized providers and low chances of resident health improvement, like in the case of nursing homes. In this paper we (i) measure the health outcomes of Dutch nursing homes in terms of mortality and avoidable hospitalizations among residents, (ii) we adopt a novel approach to test for selection bias and (iii) we examine the relationship between outcomes and other nursing home quality indicators and characteristics. Using administrative data from more than 110,000 residents, we estimate the performance of the 849 largest nursing homes in the Netherlands in the period 2015-2019. Controlling for an extensive set of observable case-mix variables, we ﬁrst test for the presence of selection bias using a distance-based instrumental variable. We do not ﬁnd any evidence for such a structural bias. While the wide conﬁdence intervals of the estimates display considerable imprecision, our results do reveal substantial diﬀerences between top and bottom performing nursing homes. Because the outcome-based estimates turn out to be only weakly correlated with other quality indicators, we conclude that our mortality and avoidable hospitalization-based indicators provide important complementary information. When small sample issues and case-mix diﬀerences are adequately accounted for, outcome-based indicators can provide useful policy guidance for quality improvement in nursing homes.


Introduction
Continued increasing demand and limited supply in nursing home care might reduce incentives for nursing homes to improve quality (Ching et al., 2015;Nyman, 1988). To stimulate quality improvements it is therefore important to inform consumers and policy makers by evaluating their performance on a regular basis. While outcome-based measures are commonly applied to enhance performance in other types of healthcare institutions, like in pay-for-performance schemes for hospitals, nursing home care is still primarily evaluated on the basis of structure (e.g. staffing) or processes of care (e.g. use of psychotropic drugs) in most countries (Barber et al., 2021). Since the relationship between structure, processes and outcomes of care is far from straightforward (Donabedian, 2003), it is meaningful to complement such indicators with outcome-based measures. The challenges in doing so for nursing homes are that appropriate outcomes are less easily defined, one has to rely on selfreported measures and small sample sizes, and that -similar to other sectors -it is uncertain whether quality differences persist after correction for observable case-mix differences.
In this paper, we study whether residents' health outcomes may be used to evaluate performance of nursing homes. We examine i) how much variation in health outcomes there is across nursing homes; ii) whether this can be attributed to differences in performance rather than to differences in unobserved resident characteristics; and iii) the association of structure and process-based quality indicators with those. We use administrative data from over 110,000 nursing home admissions in the Netherlands linked to data on mortality and avoidable hospitalizations and background characteristics, to estimate the health-value added of each of the 849 largest Dutch nursing homes. The identification of these nursinghome-specific effects is complicated by the fact that residents with high or low unobserved health might self-select into particular homes. We address this econometric challenge by testing whether our case-mix corrected estimates can accurately predict the outcomes for (quasi-)randomly admitted residents (i.e. those admitted to the nursing home closest to their prior residence 1 ). Finally, we correlate the outcome-based performance estimates to other quality indicators to verify whether structure and process indicators can explain variation in outcomes to improve understanding of the potential mechanisms involved.
Our paper makes the following contributions to the literature. First, it extends the economic value-added literature by demonstrating that by estimating a forecast coefficient when exploiting exogenous variation in provider choice, the value-added framework can still be meaningfully employed to test for the presence of selection bias, even when sufficient power to include individual instruments for each provider is lacking. 2 We apply the value-added framework to evaluate the presence of selection bias in performance on outcomes of relatively small entities like nursing homes. The existing value-added literature mainly focuses on larger organizations, like schools, hospitals, skilled nursing facilities or insurance plans, for which there is sufficient exogenous variation to estimate the causal impact and bias for each entity separately (Abaluck et al., 2021;Angrist et al., 2016;Chetty et al., 2014;Deming, 2014;Einav et al., 2022;Helsø et al., 2019;Kane and Staiger, 2008). As in Abaluck et al. (2021), we use the estimated forecast coefficient to examine whether our case-mix corrected 1 Geographical distance is an important determinant of nursing home choice and unlikely to be related to outcomes (other than through nursing home choice). Since Newhouse and McClellan (1998), this instrument has been used in many other settings in health and nursing home care (Cornell et al., 2019;Geweke et al., 2003;Gowrisankaran and Town, 1999;Grabowski et al., 2013;Helsø et al., 2019;Huang and Bowblis, 2018).
2 This alternative test also stems from the education literature (Angrist et al., 2016;Chetty et al., 2014;Deming, 2014;Kane and Staiger, 2008), and has more recently been applied to the healthcare sector (Abaluck et al., 2021;Helsø et al., 2019). outcome scores accurately predict causal variation in individual-level health outcomes.
Second, we contribute to the broader (health) economics literature by examining the predictive validity of case-mixed corrected outcome indicators like mortality and (avoidable) hospital (re)admission rates as measures of quality in the long-term care sector. Prior evidence from the hospital sector is not equivocal: some studies suggest that unobservable patient differences may generate misleading quality estimates (Gowrisankaran and Town, 1999;Hull, 2020), others that risk-adjusted outcomes do provide useful quality information (Doyle et al., 2019). These results cannot directly be transferred to the long-term care setting, because of its focus on preventing health deterioration rather than on improving health. The same holds for studies of Skilled Nursing Facilities in the U.S. (Einav et al., 2022;Rahman et al., 2016), where the hospital plays a more prominent role in choosing a facility, and many admissions are short-stays aimed at a discharges back to the community.
Third, we contribute to a better understanding of health outcomes across nursing homes by taking unobserved selection into account. The causal nursing home literature so far has only considered impacts on outcomes of one -often binary -characteristic at a time, like staffing levels, the presence of dementia special care units or ownership (Friedrich and Hackmann, 2021;Grabowski et al., 2013;Gupta et al., 2021;Huang and Bowblis, 2018;Joyce et al., 2018;Lin, 2014). However, since these characteristics are often strongly correlated, it is difficult to isolate the impact of a single characteristic, even with exogenous variation at the individual level (Konetzka et al., 2019). In contrast, we analyze the total variation in outcomes that can causally be attributed to provider differences. Prior research documenting overall differences in outcomes across nursing homes either does not take selection on unobservables into account (see for example Arling et al. (2007); Wouterse et al. (2022)), or it focuses on short stays in (U.S.) Skilled Nursing Facilities (see for example Einav et al. (2022); Rahman et al. (2016)). It is essential to know the extent to which selection bias drives total observed variation in outcomes across nursing homes (without attributing it to one characteristic), e.g. for providing valid quality information to consumers, for making fair comparisons of nursing homes' relative performance or for assessing returns on public healthcare spending (OECD and EuropeanCommission, 2013).
We find meaningful variation in mortality and hospitalization rates across Dutch nursing homes. We show the value of estimating performance using administrative data which allows for controlling for a large range of resident characteristics. After extensive case-mix correction, we find that the five percent best-performing nursing homes have a 7 and 14 percentage points lower mortality and avoidable hospitalization rate compared to the worst performing ones. The results from our selection bias test demonstrate that this variation in outcomes is not attributable to unobservable heterogeneity in resident characteristics. Our findings suggest that outcomes are weakly correlated with only a small subset of process and structure indicators. 4 2 Background

Nursing homes in the Netherlands
Nursing homes may serve two groups. First, they serve residents -or clients 3 -who need long-term institutional care and who, once admitted, typically stay there for the remainder of their life. Second, they may serve clients who are discharged from the hospital for a (limited) period of rehabilitation care or post-acute care. 4 In the Netherlands and elsewhere, there is a clear distinction between these two. In this study, we focus on the first group; long-term institutional stays. 5 For this group, the Netherlands has comprehensive social long-term care (LTC) insurance that pays for 99.9% of total nursing home care expenditures (CBS, 2017). Nursing home care, including costs for room and board, is covered by the insurance for the entire population. Nursing home recipients pay a relatively low co-payment that covers 11% of total expenditures (Rijksoverheid, 2017). The co-payment depends on the recipient's income and wealth but not on the type of care received or the nursing home chosen (Tenand et al., 2021). This makes the Dutch nursing home care accessible.
Elders need to apply for eligibility for a nursing home admission, which is granted if someone needs supervision or care around the clock. This eligibility decision is made by an independent government agency (CIZ). CIZ also decides on the care package which indicates the intensity of nursing home care that the recipient is eligible for. 6 Elders who are eligible for a nursing home admission may choose any nursing home with availability for the desired care intensity package. The waiting time in each of the regions in the Netherlands (during our study period) is limited: virtually all eligible elders can move to a nursing home within the 6-week period that is set as the norm by the government (NZa, 2021). However, some elders choose to delay their admission until their preferred nursing home has an opening and are then put on a nursing home-specific waiting list while they temporarily live in another nursing home or receive substitute home care.
All providers are private entities that are not allowed to make profit (Barber et al., 2021). 7 Nursing homes receive a per diem price per client up to a budget ceiling that are negotiated with regional single-payers who contract long-term care providers. These prices are specific for each care intensity package and are constrained by a maximum price set by the government (Barber et al., 2021).
There are several measures in place to stimulate the provision of high-quality care. First, since 2017, nursing home budgets are supplemented by a subsidy for quality improvements. To receive this additional subsidy, nursing homes submit a quality improvement plan. Second, the Healthcare Inspectorate monitors quality of care, e.g. through unannounced visits. Its quality reports are published. Third, nursing homes are required to 3 The terms nursing home residents and clients are used interchangeably throughout this paper. 4 In the US, this care may be provided in skilled nursing facilities. 5 Some Dutch nursing homes also offer day-care for elders who live at home or (short-term) rehabilitation care. Elders receiving these types of nursing home care are not included in this study.
6 Residents with lower care intensity (ZZP 4) need intensive support and extensive care, with dementia care (ZZP 5) need a protective living facility with intensive dementia care, with higher care intensity (ZZP 6) need a protective living facility with intensive support and care, with highest care intensity (ZZP 7 and 8) need a protective living facility with very intensive care and treatment or support (CIZ, nd). A resident's care intensity package may change during his/her stay in a nursing home. 7 There is a small but increasing number of for-profit nursing homes (Bos et al., 2020;Hussem et al., 2020). These nursing homes are not included in the analysis of this paper.
provide information about processes of care to the government, which is published online. Finally, nursing homes are obliged to facilitate residents and their relatives to report their satisfaction with the nursing home. Almost all providers do this through a public website called Zorgkaart Nederland. These online ratings are intended to assist (relatives of) future nursing home residents in selecting a nursing home.

Measuring nursing home performance
Nursing home quality is multidimensional and can be classified into three dimensions, namely structure, processes, and outcomes (Donabedian, 2003). In most countries, nursing home quality measures focus on the structure and process dimension of quality of care (Barber et al., 2021). Yet, Mor et al. (2003) and Werner et al. (2013) show that nursing homes that perform well on structure and process-based quality measures do not necessarily improve (health-related) outcomes of their residents. To provide a comprehensive set of quality information it is thus worthwhile to complement the widely used structure and processbased measures with information on outcomes. This subsection discusses prior work on outcome measurement in the nursing home sector and highlights two themes: the use of mortality and avoidable hospitalizations as outcomes measures and why there may be a selection bias in performance indicators using such outcomes.
Using mortality and avoidable hospitalizations as outcomes measures According to Gupta et al. (2021) and McClellan and Staiger (1999) mortality has become the "gold-standard" for measuring quality in the health economics literature. Several extensive literature reviews indicate that reduced risk of mortality is associated with higher wellbeing of older persons (Chida and Steptoe, 2008;Martín-María et al., 2017), which makes it a good candidate for measuring nursing home outcomes. Likewise, a hospital stay can not only be costly but, more importantly, is also found to be traumatic, uncomfortable and disorienting for nursing home residents (Grabowski et al., 2007;Ouslander et al., 2000). We believe that both mortality and potentially avoidable hospitalizations are undesirable and that nursing homes with lower mortality and lower avoidable hospitalizations -all else equal -are performing better. 8 Research shows that variation in such health outcomes can at least to some extent be attributed to factors influenced by the nursing home. For example, Cornell et al. (2019) show that residents admitted to Skilled Nursing Facilities with higher STAR ratings have lower mortality and fewer hospitalizations. Other channels through which nursing homes are found to affect outcomes -like hospitalizations and mortality -are staffing levels, private equity ownership, nonprofit status and the presence of a dementia special care unit (Grabowski et al., 2013;Gupta et al., 2021;Joyce et al., 2018;Friedrich and Hackmann, 2021). Therefore, we would expect at least some variation in terms of these outcomes across nursing homes.
The outcomes that we measure are restricted to the health domain. Ideally, we would measure outcomes that go beyond health, like individual level wellbeing and quality of life. However, routinely measuring these on a large scale in such a vulnerable population is not 8 As some hospitalizations may be unavoidable, we focus on those that are potentially avoidable. We do not focus on potentially avoidable causes of death as most classifications of avoidable causes of death are based on premature mortality, defined as dying before the age of 70 (OECD, 2009). Since our sample is restricted to those aged 70 and older, applying such a classification would be inappropriate. Additionally, we do not expect that euthanasia has a large contribution to differences in mortality across nursing homes since euthanasia occurred only 286 times (i.e. 1 percent, according to the number of deaths in our sample in the same year) in total in nursing homes in 2017 (Heins et al., 2019) 6 feasible. The main advantages of using mortality and avoidable hospitalizations as outcomes are that they are not self-reported, available for the full population and not prone to measurement error. Furthermore, the econometric issues that we deal with apply to all outcome measures, making this study a relevant illustration of nursing home performance measurement problems more generally. As discussed in the previous two paragraphs, mortality and avoidable hospitalizations likely capture sufficiently relevant aspects of nursing home performance to be indicative of other types of relevant outcomes.
Selection bias in the nursing home setting Variation in outcomes may be driven by selection bias. There are several reasons why nonrandom selection could occur in the nursing home setting. First, some nursing homes may selectively attract a certain type of clients. For-profit nursing homes might, for example, have an incentive to attract more profitable or less costly clients, especially when they are close to their full capacity (Gandhi, 2020;He and Konetzka, 2015). Second, different types of individuals may choose a nursing home based on different criteria, which could cause performance measures to be either positively or negatively biased. On the one hand, elders (or their family members) who consider themselves more likely to be more severely ill, and more dependent on care services, may be more inclined to choose a nursing home that has a reputation to deliver higher quality care. On the other hand, elders who are more severely ill may be less able to "shop around" for quality care, especially after a sudden impairment and end up choosing a nursing home with no waiting list, instead of one with higher perceived quality (Castle, 2003;Schmitz and Stroka-Wetsch, 2020). Also, prospective residents who are more responsive to quality might be wealthier and better educated (Bensnes and Huitfeldt, 2021) or have better informal networks. Elders with such an advantageous environment can generally be expected to have a better (unobserved) health status, and they may also be more responsive to quality indicators when choosing other types of healthcare providers (Bensnes and Huitfeldt, 2021;Cornell et al., 2019). In sum, some degree of nursing home selection may be expected, but it seems hard to predict the direction of any bias a priori.
The literature on selection bias in nursing home outcome measures is limited and, like most research on nursing home quality (Lippi Bruni et al., 2019), generally focuses on Skilled Nursing Facilities (SNFs) in the United States. 9 Arling et al. (2007) demonstrate that shifts occur in SNF quality rankings when more observable differences in client characteristics are accounted for. This indicates that there may also be some selection on observable characteristics of SNF clients. However, how much selection does remain when many observable characteristics are already accounted for? Rahman et al. (2016) report wide variation in risk-adjusted re-hospitalization rates across SNFs (i.e. 15 percentage points between the five percent best and worst performing facilities). They show that these rates are an accurate prediction of the re-hospitalization risk of individuals admitted to these SNFs a few years later, suggesting that variation in risk-adjusted re-hospitalization rates is not driven by selection on unobservables. In contrast, Einav et al. (2022), do find evidence of selection bias driven by unobservably healthier clients being more likely to be admitted to SNFs that generate larger health improvements.
It is still an open question how any of the prior results on selection bias from the U.S. SNF may have relevance for the Dutch nursing home setting. Although both types of facilities offer institutional care mainly to older clients who cannot live at home yet or anymore, the Dutch system, like in many other developed countries, almost exclusively concerns on long-stays rather than short-stays and has a much more comprehensive social system for long term care (Barber et al., 2021). On the one hand, the role of selection likely plays a more prominent role in the U.S. setting due to financial incentives to admit short-stay non-Medicaid patients (see also Gandhi (2020)). Moreover, the focus on nonprofit nursing homes -which forms the largest part of all nursing homes in the Netherlands (Bos et al., 2020) -may induce a smaller role of non-random selection since nonprofit nursing homes may be less inclined to selectively attract healthier clients. On the other hand, the choice process may be more selective for long-stays since it requires the decision on where to reside until death (Bom, 2021), compared to where to stay for 26 days -the average length of stay in post-acute care in the U.S. (Cornell et al., 2019).

Nursing home residents
We use administrative data provided by Statistics Netherlands encompassing the full Dutch population (more detailed information about the data sources can be found in Appendix A.1). Our sample consists of individuals who were admitted to a nursing home for the first time between January 2015 and July 2019. 10 We use individual-level information on provider codes and addresses from the municipal registry, to link 87 percent of the 2015-2017 population to the nursing homes that they were admitted to. 11 We complement this sample by individuals who entered a nursing home in 2018 to July 2019 that we could match to nursing homes using information on addresses only. As a first step to make the resident populations across nursing homes more comparable, we dropped 9,438 (7%) admissions of residents whose age was younger than 70 at the time of admission 12 , followed by 3,627 (3%) admissions for individuals for whom we have missing data on background characteristics. Our final study sample includes 119,699 nursing home residents in the mortality analyses, and 83,056 residents in the hospitalization analyses: data on hospitalizations is only available until December 2017. 13

Nursing homes
Our data contains an anonymized provider code, but not the location. We do observe where individuals live and therefore identify nursing home locations by an address on which at least 5 individuals receive care within the same time period provided by the same provider (based on the provider code). Nursing home facilities belonging to the same chain organization can use the same provider code, but are distinguished using the address information. We use 10 We focus on people with care packages 4-8, which are for long-term nursing home stays. That is, we exclude people who are eligible for palliative care (care package 10) or geriatric rehabilitation (care package 9). 11 In 31,688 cases, address information was missing and the provider code belonged to multiple nursing home facilities within an organization. For this group, we imputed to which nursing home facility the resident was admitted using admission data of the nearest neighbour with the same provider code and non-missing address information.
12 Although this is a significant share of our sample, being admitted to a nursing home before the age of 70 is a rare event (i.e. between 0 to 0.5 percent depending on the age-group) in the Netherlands. 13 We did not exclude residents who switched nursing homes during the study period because this may underestimate variation in performances, as they may switch from low to high quality homes. the provider codes combined with postal codes to link the information on quality indicators. Descriptive statistics about the quality indicators of the included nursing homes can be found in Appendix Table B1.
We include the 849 largest nursing home facilities, with at least 50 new admissions during the entire study period, in our main analysis. 14 Figure 1 shows the variation in size, in terms of the number of newly admitted residents across nursing homes. To ensure sufficient statistical power, we do not attempt to estimate performance for the nursing homes with fewer admissions (to the left side of the red line in Figure 1). The 21 percent of residents who are admitted to one of these 1,008 smaller nursing home facilities are included in the reference category in the analyses.

Health outcome measures
We focus on two outcomes, namely mortality and avoidable hospitalization. We define hospitalizations as potentially avoidable if they are related to a main diagnosis that could have been prevented or treated in the nursing home. For example, hospitalizations resulting from falls in nursing homes may be preventable by hip protectors (Vu et al., 2006) or by adaptations to the environment like an optimized light design for residents with cataract or height adjusted chairs (Becker and Rapp, 2010). The diagnoses that we classify as such (see also B2 in the Appendix) are based on two studies on ambulatory care-sensitive hospitalizations for the elders population (Carter, 2003;Walker et al., 2009), to which we add hospitalizations due to falls and fractures, wounds and rehabilitation as potentially preventable or treatable in a nursing home-thus being potentially avoidable. More than half of the avoidable hospitalizations are due to falls and fractures, about 8 percent to pneumonia, 6 percent to asthma and COPD, 5 percent to rehabilitation and 4 percent to kidney or urinary tract infections (Appendix Table B2).
We construct binary outcomes (equal to one if the individual died or had an avoidable hospitalization within 180 days after admission) as our main dependent variables to limit the influence of right censoring. 15,16 The 180-day cut-off is somewhat arbitrary, but in line with the nursing home literature (Cornell et al., 2019;Intrator et al., 2004;Vossius et al., 2018). Additionally, Kaplan-Meier survival curves in Figures C1a and C1a in the Appendix confirm that most variation in time until death and until an avoidable hospitalization occurs in the first half year after nursing home admission. The robustness checks examine the sensitivity of our results to the use of different cut-offs. In our sample, the average 180-day mortality and avoidable hospitalization rates are 21 and 13 percent respectively (Table 1). 17,18

Case-mix controls
We control for observable differences in nursing home residents' characteristics by including an extensive set of case-mix controls in our analyses: age at admission; gender; whether someone lives in an rural municipality, defined by an average of at least one thousand addresses per square kilometer; yearly disposable household income, standardized by household size; wealth from assets and savings; and a comprehensive set of proxies for health: whether the person visited the hospital within 30 days prior to nursing home admission (also to account for potential hospital re-admissions); the Charlson comorbidity index based on 12 comorbidities like dementia, cancer and pulmonary diseases 19 ; the number of different types of medicine consumed; and healthcare expenditures from the year before nursing home admission. Furthermore, we include care needs as measured by the care intensity package as determined by the independent eligibility assessment agency. An overview and more extensive explanation of all covariates can be found in Appendix Table B4. Table 1 shows how these covariates vary across the health outcomes of individuals. Older male residents, those receiving a higher care intensity package, those who use more medication, those with a higher Charlson comorbidity index or those who visited the hospital within 30 days before nursing home admission, have a higher probability of dying within the next half year. On the other hand, residents who experienced an avoidable hospitalization within 180 days after admission are, on average, younger and enter the nursing home receiving lower care intensity. This implies that healthier individuals (i.e. younger with lower care needs) are more likely to be admitted to a hospital. Nonetheless, as we control for differences in underlying health across nursing homes, we interpret a higher risk of avoidable hospitalizations an undesirable outcome. 15 While mortality is already a binary event by nature, we could measure the hospital outcome as the number of avoidable hospitalizations. However, as this may be influenced by re-admissions, we use the count in our robustness tests only. 16 As data on mortality is available up until 2019 and only 7 percent of individuals left the nursing home before their death, we are not concerned about right-censoring in this outcome measure. Hospitalizations are not accounted for censoring from deaths.
17 The 180-day mortality rate is similar to that in skilled nursing facilities in the U.S. (Cornell et al., 2019) and slightly higher than in care homes in Norway (Vossius et al., 2018).
18 Table B3 shows that, of the 23,165 residents who had at least one hospitalization within half a year after nursing home admission, 37 percent experienced a hospitalization that was potentially avoidable. Both this percentage and the 180-day hospitalization rate are higher in comparison to other studies (Carter, 2003;Intrator et al., 2004;Walker et al., 2009), likely resulting from the inclusion of falls and fractures as an avoidable cause. 19 The Charlson comorbidity index is an indicator for disease burden and/or a predictor of mortality (Sundararajan et al., 2004). We use the updated version constructed by (Quan et al., 2011) which reflects a weighted score based on 12 comorbidities, among which dementia, diabetes and cancer. This table presents the averages or shares (%) of each case-mix control variable by the mortality and avoidable hospitalization outcome including differences between those for whom the outcome equals one and zero; Age, care intensity, rural and year at moment of nursing home admission; healthcare expenditures, wealth, std. household income, number of medicine and Charlson score from the (calendar) year before admission; Standard errors (se) between brackets. * Difference is statistically significant at 10 percent; ** at 5 percent; *** at 1 percent. 1 Lower -intensive support and extensive care; Dementia -protective living facility with intensive dementia care; Higher -protective living facility with intensive support and care; Highest -protective living facility with very intensive care and treatment or support.
4 Empirical strategy

Observed performance
To quantify the effect of a nursing home j on the probability of an adverse health outcome, we use a linear value-added framework: where Y i is the outcome for individual i conditional on being admitted to nursing home j. The expected outcome depends on an individual's observed characteristics X i , which include proxies for prior health, an unobserved individual component i , and a nursing home specific effect δ j . 20 δ j is the nursing home level estimate of interest: the value-added of the nursing home, i.e. the nursing home's impact on the outcome under the condition of exogenous nursing home choice. We assume that the nursing home impact is additive and homogeneous across residents.
We estimate a linear probability model using an ordinary least squares regression (with robust standard errors) to obtain each nursing home's performance on the two health outcomes. 21 The estimation equation is as follows: where Y i is a zero-mean dichotomous outcome variable for individual i -e.g. mortality -and X i are the individual level case-mix controls. α 0 represents the reference category which includes all individuals that were admitted to one of the smaller nursing homes. H ij is a dummy variable that equals one if individual i is admitted to nursing home j (j = 1, 2, .., J). The estimated parameterδ j reflects the nursing home j's effect on the outcome -or the nursing home's value added.
Our estimatesδ j , especially those for small nursing homes, are surrounded by sampling imprecision. Like Angrist et al. (2017) (2008), we therefore account for the statistical noise in our value-added estimates by applying a standard empirical Bayes correction (Morris, 1983). The empirical Bayes estimator of δ j is weighted average of the precisely estimated grand mean (the average outcome across all nursing homes) and the imprecisely estimated nursing-home-specific OLS estimateδ j , where the weight of the latter is proportional to its estimation error. As the estimation error is greater for small homes, the shrinkage is larger for these homes than for large homes. 22 We use these empirical Bayes estimates when evaluating total variation in 20 Our value-added model deviates from the classical ones in the sense that we include proxies for individual's health as right-hand side variables instead of the individual's outcome Y i prior to admission. The latter is simply not possible given the nature of our outcome variables. 21 Estimating the same specification with a logit or random effects generates estimates that highly correlate (> 0.99) with the ones from the OLS procedure. 22 We use the following estimator δ EB j = τ jδj + (1 − τ j )δ j , whereδ j is obtained by estimating Equation (2),δ j is equal to the average ofδ j across all nursing homes, and the shrinkage factor τ j is equal to with σ 2 being the between nursing home variation minus the average noise and se 2 δ being equal to the within nursing home variation. Under the assumption that the nursing-home specific effects are independent, this estimator is equivalent to an empirical Bayes estimate of the nursing home specific effects given that both the prior and likelihood function come from a normal distribution (Angrist et al., 2017;Chetty and Hendren, 2018;Kane and Staiger, 2008;Morris, 1983). performance in Section 5.1 and when correlating performance on other quality indicators in Section 6.

Testing for selection bias
Individuals are not randomly assigned to nursing homes, but are to a large extent free to choose the home that they prefer. We might therefore be worried about selection bias.The question we want to address is, after we correct for observable differences in individuals' characteristics X i , are the estimates of observed performanceδ j biased by unobserved individual differences? This is the case if there is a correlation between and individual's unobserved health and the performance -i.e. value added -of the nursing home he or she goes to (i.e. a correlation between i and H ij in Equation 2).The bias can be either positive or negative, as preferences for nursing home quality can be both positively or negatively correlated to (unobserved) individual health (see Section 2.2).
The standard way of dealing with selection bias is to focus on plausibly exogenous variation in nursing home choice and exploit this variation, using instrumental variable analysis, to obtain (causal) estimates if δ j . In our case, this would entail instrumenting each of the J = 849 nursing home dummies in Equation (2), which requires at least 849 instrumental variables to obtain a just or over-identified model. Although this can be done in some settings (see Gowrisankaran and Town (1999); Hull (2020) using such an approach for hospital care), in the setting of nursing home care, with many small-sized providers, this is not feasible because the lack of power likely causes a many weak instruments problem (Angrist et al., 2016). 23 Instead of trying to obtain a 'causal' IV-estimate for each nursing home, we test ex-post whether the observed performance measuresδ j -estimated by Equation (2) 2019)). The intuition behind the test is as follows. Suppose we already estimated the case-mix corrected -or value-added -scores from Equation (2) and then afterwards could randomly assign a new group of individuals over the J nursing homes. If the estimated performance scoresδ j would be unbiased, then these scores would perfectly predict the (average) outcomes for the randomly assigned group. We could run the following regression on the sample of randomly assigned individuals: withδ ij the estimated performance score of the nursing home to which individual i has been assigned to. This regression would provide a simple test of (average) selection bias based on the forecast coefficient λ: if the values ofδ ij represent (on average) the true causal effects of nursing homes on the outcome, then λ should be equal to one. 24 ). If, on the other hand, the estimates suffer from selection bias then λ will be either smaller or larger than one. If there is a positive correlation between unobserved health and nursing home quality (healthier clients are more likely to choose better nursing homes) then λ will be larger than one. We 23 In a recent working paper, Einav et al. (2022) estimate the added health value of skilled nursing facilities in the U.S., using a control-function approach (which is quite similar in spirit to an IV-approach) to correct for potential selection on unobserved health. The average number of treated patients in these facilities, aimed at rehabilitation, is substantially higher than in the permanent residential homes we investigate. Also, Einav et al. (2022) seem to use a relatively restrictive model for patient choice.
24 Although λ = 1 implies that there is no bias on average across nursing homes, this does not rule out the possibility that the scores of some specific homes are biased (see Angrist et al. (2017);Hull (2020) then overestimate nursing homes' performance, in the sense that the observed performance is better than true performance for high quality homes and lower than true performance for low quality homes. If the correlation between unobserved health and nursing home quality is negative, then the observed performance is an underestimation of the true effect.

Instrumental variable approach
In practice, we cannot randomly assign a group of clients over the different homes, and thus have to rely on quasi-exogenous variation instead. If there is a subgroup within our population for which it is credible that nursing home choice is not related to expected outcomes, then we can use this group to perform a test similar to that for the imaginary randomly assigned group in Equation (3).
The source of variation we exploit is geographical distance from a client's home to a nursing home. Distance is an important driver of nursing home choice. Both earlier and more recent literature report distance to be a strong, if not the dominant, driver of nursing home choice (Castle, 2003;Gadbois et al., 2017;Hackmann, 2019;Schmitz and Stroka-Wetsch, 2020;Shugarman and Brown, 2006). As a result, location-based instruments are used widely to predict provider choice both in and beyond the nursing home literature (Einav et al., 2022;Gandhi, 2020;Gowrisankaran and Town, 1999;Grabowski et al., 2013;Hull, 2020;Newhouse and McClellan, 1998). Moreover, a location-based instrument is unlikely to be related to the unobserved component of the individual's outcome as regional differences in health are expected to be small in our setting, especially since we control for an extensive set of health proxies at the individual level.
To implement the forecast test using quasi-exogenous variation in nursing home choice based on distance, we perform an IV using two-stage least squares (similar to Abaluck et al. (2021); Deming (2014); Helsø et al. (2019)). In the first stage, we predict the observed performance scoreδ ij of the nursing home j that individual i actually goes to using the observed performance scoreδ Closest ij of the nursing home that is closest to individual's i former residence:δ In the second stage, we use the first stage predictions of the performance scoreδ ij to examine the effect of the nursing home performance scores on the outcomes (only) for individuals who move to a nursing home because it is the closest to their prior home. We do this by regressing individuals' outcome on the first-stage prediction of the performance score: If our instrument is valid (the performance of a nursing home in uncorrelated with the unobserved health of the clients that live closest to it) 25 , the interpretation of the forecast 25 We further have to assume that the effect ofδ j is homogeneous (i.e. constant for all individuals). Lambda is also affected by how compliers are distributed over the range of values that the treatment variable takes. If this distribution is not the same as in the full sample and if there are heterogeneous treatment effects, a deviation in lambda from one might not only be caused by selection bias (i.e. selection of lowmortality patients into low-mortality nursing homes) but also by systematic variation in the distribution of compliers over heterogeneous treatment effects. Our LATE would then deviate from one not because of a bias, but due to the high mortality estimates getting a lower weight due to lower compliance on that side of the distribution. We argue that this is unlikely to be an issue in our setting as almost every nursing home -irrespective of its performance -is the closest one to at least 20 people in our sample and is (in 99% of coefficientλ is the same as in Equation (3); ifδ j is an unbiased estimate of the true effect of a nursing home on clients' outcomes, it should (on average) perfectly predict the outcomes for clients who go a particular home solely because it's the closest to their prior residence. A forecast coefficient that is not equal to one then signals that observed performance is, on average, biased.

Instrumental variable assumptions
We first reflect on the assumptions that must hold to interpretλ as the impact of nursing home performance on a random individual's outcome. Two of these assumptions of the IV approach are that the instrumental variable -in our case performance of the closest nursing home -is (i) relevant and (ii) valid. A third condition is monotonicity. 26 We discuss the weak monotonicity assumption, which is sufficient for causal interpretation (Frandsen et al., 2019), in Appendix A.2. In the following subsections we pay more attention to why the relevance and validity assumption are likely to hold in this setting.

Relevance
The instrument is relevant if it has strong predictive power for nursing home choice. Prior studies argue and show that travel distance is the most important determinant of nursing home choice (Castle, 2003;Gadbois et al., 2017;Gandhi, 2020;Hackmann, 2019;Schmitz and Stroka-Wetsch, 2020;Shugarman and Brown, 2006). In our setting, we therefore expect that, all else equal, individuals prefer a nursing home that is closer to their prior home. Figure 2a confirms that most residents choose a nursing home that is close to their prior home: more than 60 percent of our sample is admitted to a nursing home within 5 kilometers from his or her prior home. Figure 2b shows that 21 percent chooses the nursing home that is closest to their former home. This suggests that the instrument is likely to be relevant.
The results from the first stage regression (Equation (4)) confirm that the instrument is strong. Table 2 shows that the first stage coefficient is economically and statistically significant. The partial F-statistics, which are equal to 7,189 and 18,545, both affirm the relevance of our instrument for both outcome variables (Staiger and Stock, 1997). 27

Validity
The instrument is valid if the unobserved health and other characteristics of individuals (ε i in Equation (5)) are not correlated with the performance of the closest nursing home. That is, performance of the closest nursing home should only be related to an individual's outcome through choice. The validity assumption could be violated if unobservably (un)healthy clients are systematically located closer to the same nursing homes -keeping all individual observable characteristics fixed. This might be the case if better nursing homes are more likely to be located closer to prior homes of individuals with better underlying health (Helsø et al., 2019). This could be an important concern as previous research shows that, at least in the US, high quality nursing homes are more likely to be located in -or closer to -wealthier the cases) chosen by at least 5 percent of those. In other words, the distribution of the compliers over the treatment variable is likely the same as the distribution of the entire study population.
26 This condition has received limited attention in other studies using the value-added framework, partly due to the nature of their instruments (Angrist et al., 2017;Chetty et al., 2014). 27 The reported F-statistics are extremely high. This is not surprising since the instrument directly links to the endogenous variable and our sample size is relatively large.    This table reports the estimated coefficient, standard error and partial F-statistic from the first stage (Equation (4)) which is a linear regression of endogenous performance of the chosen nursing on performance of the chosen nursing home. Standard errors between brackets. * * * Statistically significantly different from zero at 1 percent; * * at 5 percent; * at 10 percent. The sample is restricted to those that are admitted to one of the 849 largest nursing homes.
areas . 28 We expect the influence of this issue to be limited in our setting since we include individual level and precisely measured covariates like income and wealth as covariates when estimating performance. 29 Furthermore, (Cornell et al., 2019;Rahman et al., 2016) show that including zip code fixed effects limits the influence of regional differences that may be related to unobserved health and living close to a well-performing nursing home. In two robustness tests in Section 5.4, we therefore include neighbourhood characteristics as controls and neighbourhood fixed effects. We find that these additions do not change our main results. 30 This could either imply that the location of well performing nursing homes is not related to the health of neighbouring individuals in the Netherlands, or that our extensive set of covariates -encompassing individual level socio-economic indicators and proxies for the individuals' health -already controls for regional variation in health. Figure 3 presents the observed performance estimates (δ j from Equation (2) after shrinkage) on the y-axis for all 849 largest nursing homes in groups of ten, ranked by their performance. 31 There is a 7 percentage point difference in performance on mortality and a 14 percentage point difference in avoidable hospitalizations between the five percent bestperforming and five percent worst-performing nursing homes. However, the wide 95% confidence intervals show that the individual estimates are imprecisely estimated, related to the relatively small size of most homes. The imprecision of our estimates does not facilitate the interpretation of observed differences, i.e. it remains hard to ascertain whether these are driven by true differences in performance or by imprecision.

Test for selection bias
As discussed throughout the paper, the variation observed in the figures above may be driven by unobserved heterogeneity in resident characteristics across nursing homes. In this subsection we present the results for our test for such a selection bias. More specifically, after obtaining predicted performance in the first stage (see also Equation (4) and Table 2), we obtain an estimate for the forecast coefficient through Equation (5). We test whether 28 In spite of this, distance-related instrumental variables are frequently used to correct for non-random selection into hospitals and nursing homes (Cornell et al., 2019;Geweke et al., 2003;Gowrisankaran and Town, 1999;Grabowski et al., 2013;Helsø et al., 2019;Huang and Bowblis, 2018;Newhouse and McClellan, 1998). 29 The likelihood and severity of violations of the validity assumption is small in the Dutch context for several other reasons. First, financial constraints do not play a role when choosing a nursing home, because the co-payment is the same in all nursing homes. This means that selection related to socioeconomic status is likely much more limited than in the US and many other countries. Second, elders are unlikely to select their place of residence (where they lived prior to nursing home admission) according to the performance of the nursing homes since most elders have lived in the same neighborhood for many years before they enter a nursing home (Diepstraten et al., 2020). 30 We do not include neighbourhood fixed effects in our main specification because the small number of people per neighbourhood moving to a nursing home minimizes the within-neighbourhood variation in which of the nursing homes is the closest one. This significantly reduces the power of the first stage, which in turn decreases the precision of our forecast coefficient. 31 Estimates are published in groups of ten as, for privacy reasons, the results for individual nursing homes cannot be published.  This figure displays the nursing home performances on 180-day mortality (a) and avoidable hospitalizations (b). We present estimated performance (by Equation (2)) after empirical Bayes shrinkage. Nursing homes are ranked on their performances and subsequently divided into 84 equally sized groups of 10 to 11 nursing homes. The x-axis represent these nursing home groups. The y-axis indicates average observed performance of each of these groups and its confidence intervals, which are calculated based on the standard error from a randomly chosen nursing home within each group.
the estimated forecast coefficientλ is equal to one. If we fail to reject this test, we interpret our estimated performance estimates as unbiased on average.
The forecast coefficientsλ are economically and statistically not significantly different from one (Table 3). The estimated forecast coefficients deviate only minimally from one: choosing a nursing home with an above average mortality of 2 percentage points instead of one of 1 percentage point, increases the risk of dying by 1.07 percentage points. 32 These minor deviations from one are for both outcomes not statistically significant different (p = .408 and p = .104). This implies that observed performance is, on average, unbiased and is likely to predict true differences in performance across nursing homes.

Subgroup analysis
One of the assumptions of the value-added model is that nursing home performance scores are homogeneous across residents (see Section 4.3). When estimating the relevance of observed performance for clients with different care needs, we shed light on whether this assumption is plausible: i.e. is observed performance representative for all clients. For this, we use the forecast coefficient estimated through Equation (5) replacing predicted performance by observed performance of the chosen nursing home to compare deviations ofλ from one for the different subgroups. This test provides insights on whether observed performance is more informative for specific groups, which also is a relevant question on itself.
The regression results in Table B5 in the Appendix show that the estimates on both outcomes 32 To compare, the absolute deviation of the forecast coefficient from one lies within the ones found in studies on hospital performance including extensive controls by Helsø et al. (2019) (λ = 0.956) and Hull (2020) (λ = 1.086).  (5)), which estimates the impact of predicted performance of the chosen nursing home on the individual level outcome, either mortality or avoidable hospitalization. The test statistic report the χ 2 statistic and the pvalue when testingλ = 1. Standard errors are reported between parentheses and p-values between (squared) brackets. Standard errors between brackets. * * * Statistically significantly different from zero at 1 percent; * * at 5 percent; * at 10 percent. The sample is restricted to those that are admitted to one of the 849 largest nursing homes.
are predictive for residents of all care needs, but for some more accurately than for others. We find that the variation in observed performance on mortality is (on average) slightly overestimated (λ < 1) for individuals with lower and the highest care needs. Although these results suggest that the estimates on outcomes are somewhat heterogeneous across care need groups, this does not affect the IV result from Section 5.2. 33

Robustness
We examine the robustness of our results with two additional sets of checks: (i) by including larger sets of controls; and, (ii) by using different definitions for our health outcome measures. We inspect how they correlate with our baseline estimates and whether the result of no structural bias (from Section 5.2) is robust to these adjustments.
First, in Section 4.4 we reflected on the validity of our instrumental variables. We mentioned that any systematic differences in unobserved health that are related to the location of someone's prior home are threats to this validity. Therefore, in two robustness checks, we include either neighbourhood characteristics that might be related to someone's health as additional control variables (i.e. average property (house) value, average household income and the share of households living below the poverty threshold at the neighbourhood level as measures of neighbourhood living standards) or include neighbourhood fixed effects.
Columns 1-2 in Table 4 demonstrate that our results are to a large extent robust to the inclusion of these covariates and neighbourhood fixed effects: performance estimates from both models are highly correlated with our baseline estimates. Additionally, at least with 95% certainty, we cannot reject that the forecast coefficientsλ are equal to one when including neighbourhood controls or fixed effects. 34 Second, by estimating different specifications of our outcome measures, we verify whether they are sensitive to how they are defined. If performance varies across different stages of admissions, e.g. between the first 90 and 180 days, having a strict cut-off within the outcome measure may not be appropriate. Nevertheless, we find that performance on our main outcomes is highly correlated -with correlations of at least 0.8 -with those of the other specifications in columns 3-5 in Table 4. This also holds for using only falls and fractures instead of all avoidable hospitalizations as an outcome. However, we find a statistically significant bias in observed performance on one-year mortality, discouraging the usage of this outcome as a quality measure. Panel A shows how our baseline performance estimates correlate to the ones specified in each of the columns. Panel B reports the forecast coefficient, which is equal to the coefficient of predicted performance of the chosen nursing home (by the first-stage regression) in a regression with the individual level outcome as a dependent variable. Panel C tests whether the forecast coefficientλ is different from zero. Column 1 includes average property (house) value, average household income and the share of households living below the poverty threshold at the neighbourhood level as additional control variables. Column 2 includes neighbourhood fixed effects. In the remaining columns, the outcome variables are specified respectively as 3) the occurrence of an adverse health outcome within 90 days; 4) within 365 days for mortality and being hospitalized due to a fall or fracture for avoidable hospitalization; 5) the number of days alive within 180 days after admission for mortality and the number of avoidable hospitalizations within 180 days. Standard errors between brackets. * * * Statistically significantly different from zero at 1 percent; * * at 5 percent; * at 10 percent. Column 1 excludes individuals from very small neighbourhoods, as there is no data available for those. Estimates in Panel A are correlated to estimates obtained from this sample (excl. small neighbourhoods), but with the baseline model (no neighbourhood characteristics). Column 2 excludes individuals for whom the neighbourhood could not be identified. Column 4 for the mortality outcome excludes those admitted to a nursing home from January 2019 onward.

6 Correlations with quality indicators
In this section we examine to what extent quality indicators on other dimensions -like process and structure -can explain observed variation in outcomes. We explore the association of observed performance to publicly available measures of nursing home quality that are often used in comparisons (Castle and Ferguson, 2010;Spilsbury et al., 2011). 35 We use these results to evaluate whether performance on outcomes could complement the available indicators based on the other dimensions of quality.

Process and outcome quality indicators
The nursing home mortality scores are positively correlated with high levels of psychotropic medicine use which is a process-based indicator of low quality that is reported by the Dutch Health and Youth Care Inspectorate (Table 5). 180-day mortality in nursing homes in which all clients use psychotropic medicines is 2 percentage points higher compared to nursing homes in which none of its clients uses psychotropic medicine. Although the coefficient is rather small, the sign of the correlation is in line with the medical literature (Bronskill et al., 2009). Psychotropic medicine use may be related to mortality through side effects that may be more harmful to an older population, like diarrhea (Lindsey, 2009) and delirium. On the other hand, the relationships may also be confounded by (unobserved) other types of nursing home quality. Phillips et al. (2018) argue for example that the number of registered nurse hours is one of the main drivers of antipsychotic medication use among nursing home residents, which may in turn affect mortality through other channels or processes.
Moreover, we find that nursing homes with high rates of avoidable hospitalizations (low quality) have lower pressure sores rates (high quality). At first sight, this correlation may appear to be opposite to what one would expect. However, the negative association with pressure sores may well be a result of residents spending a relatively long time in their nursing home beds, which increases the risk of pressure sores. However, at the same time, spending a lot of time in bed could prevent nursing home residents from falling, which is one of the main contributors of avoidable hospitalizations. In that case, the negative associations are plausible, although it may raise questions about the interpretation of the avoidable hospitalization outcome. Other possible explanations are that pressure sores may be underreported in the bottom performing nursing homes (Kaltenthaler et al., 2001), or that the (uncorrected) variation in pressure sores is driven by case-mix differences which causes the relationship with performance to be negative if those in worse health (pressure sores ↑) are more likely to be admitted to better performing nursing homes (avoidable hospitalizations ↓). Table 6 presents associations between nursing home characteristics -or structure characteristics -and their mortality and avoidable hospitalizations outcomes. Most estimated coefficients are relatively close to zero. This may mean that, although the multivariate regression includes various observed characteristics 36 , the results may be confounded by 35 Almost every characteristic that we consider is an average over multiple years between 2015 and 2018. The descriptive statistics and a more elaborate description of these characteristics can be found in Appendix  Table B1. 36 When examining correlations of the same characteristics in bi-variate regressions, we find very similar results. The only difference is that staff absenteeism becomes statistically significant at 10 percent when excluding the other characteristics as covariates. Regression results of eight (2 × 4 (I-IV)) separate bi-variate regressions at the nursing home level with either performance on mortality (column 1-2) or avoidable hospitalization (column 3-4) as dependent variables. It uses the performance estimates obtained in Equation (2) after shrinkage. Nursing home characteristics are also at the nursing home facility level. Descriptive statistics of nursing home characteristics can be found in Appendix Table B1 Standard errors between brackets. * * * Statistically significantly different from zero at 1 percent; * * at 5 percent; * at 10 percent.

Structure quality indicators
(unobserved) other types of nursing home quality, such as managerial quality, which may have offsetting effects: it may reduce the number of staff or higher educated nurses by empowering nurse aids, which in turn could improve outcomes (Barry et al., 2005). Yet, the very weak correlations may also imply that existing structure-based quality indicators do not accurately capture variation in performance on the health outcomes that we measure.
Nevertheless, there are some structure quality indicators that show a somewhat stronger association with performance on outcomes. For instance, we find that a relatively long waiting list is (weakly) negatively associated with nursing home mortality: having a one standard deviation larger waiting list to client ratio (of 12 percent) is associated with a 0.28 percentage points lower mortality rate. Caution is warranted as this association may be due to reverse causality; when nursing home mortality is low, turnover of clients is also low, which may in turn result in longer waiting lists. On the other hand, even in a situation in which mortality rates are not publicly available -as in the Netherlands -there may also be some perception of quality that makes nursing homes with lower mortality more popular.
Our results do not provide evidence for strong relationships between various staffing indicators and performance on health outcomes. This may seem surprising since some studies report that adverse outcomes are related to, for example, lower (registered) nurse employment (Friedrich and Hackmann, 2021;Lin, 2014) and higher nurse turnover (Antwi and Bowblis, 2018). However, the findings from a literature review (see Spilsbury et al. (2011)) suggest that the evidence on this topic has been contradictory. Our results do indicate that nursing homes with larger shares of specialists -like geriatricians and psychologists -relative to total staffing are likely to have lower mortality. Although this relationship is statistically significant, in economic terms it is relatively weak.
Finally, the reported coefficients in Table 6 suggest that the size of nursing home organiz-ations is linked to performance on avoidable hospitalizations. Keeping the other observed characteristics fixed, an organization with 6 additional facilities (equal to one standard deviation) is associated with a 0.6 percentage points higher avoidable hospitalization rate. This implies that nursing homes that belong to a larger (chain) organization score worse on avoidable hospitalization performance. This finding is in line with quantitative evidence from the United States You et al., 2016), who argue that this relationship could be explained by chain targeted nursing homes being of lower quality because of, for example, a poor financial situation, both before and after acquisition. Qualitative evidence suggests that differences between low and high hospitalization nursing homes are related to how the staff approaches the decision to hospitalize (Cohen et al., 2017). Nursing homes with low rates generally make this decision case-by-case, whereas those with higher rates are more likely to approach it as an algorithmic process. The decision process may well be related to whether the nursing home belongs to a non-chain organization since they are characterized by having more autonomous staff (Kruzich, 2005), being more flexible in care provision (Lucas et al., 2007) and since staff may have a more personal relationship with the residents. Regression results of two multivariate regressions with either nursing home specific performance on mortality (column 1) or avoidable hospitalization (column 2) as dependent variables. It uses the performance estimates obtained in Equation (2) after shrinkage. Nursing home characteristics are either at the nursing home facility level or at the organisation level, which are copied to all facilities within the same organisation. Descriptive statistics of nursing home characteristics can be found in Appendix Table B1 Standard errors between brackets. * * * Statistically significantly different from zero at 1 percent; * * at 5 percent; * at 10 percent.

Conclusion and discussion
As the quality of care in nursing homes has been the subject of vigorous public debate for decades, the sector might benefit from improved performance measurement based on health outcomes to complement the often-used structure and process measures (Barber et al., 2021;OECD, 2005;OECD and EuropeanCommission, 2013). However, quality estimates tend to rely on self-reported outcomes, be based on a small sample of residents, and be hampered by selection bias. The question is whether these challenges can be addressed and how outcome information can be used best to evaluate performance.
We have addressed the following three questions in this paper. First, how large is the variation in health outcomes across nursing homes? Second, to what extent can this variation be attributed to differences in performance of the nursing homes rather than to unobservable differences in case-mix? Third, is there any relationship with quality indicators based on structure and processes? If such existing indicators to a large extent explain variation in health outcomes, then complementing information on structure and processes with information on outcomes may be of less importance.
We use detailed administrative data to estimate variation in performance on outcomesmortality and avoidable hospitalization risk -when correcting for observable case-mix differences. In addition, we apply a novel test developed in the value-added literature to examine the role of selection bias in nursing home outcomes. Finally, we examine how these outcomes relate to nursing home characteristics on other dimensions.
After controlling for differences in case-mix, we find substantial heterogeneity in clients' health outcomes across Dutch nursing homes. Due to the small population sizes, the estimates are relatively imprecise, but we can statistically distinguish top performers from bottom performers. We find that the probability of dying or being hospitalized within 180 days after admission is 7 to 14 percentage points higher in the five percent worst performing nursing homes compared to the best. The variation in the avoidable hospitalization outcome is comparable to the variation in rehospitalization rates of Skilled Nursing Facilities in the United States reported by (Rahman et al., 2016), 2016. Moreover, we do not find that unobserved heterogeneity in client characteristics due to non-random selection into nursing homes leads to biased performance estimates. The correlation with other indicators of provider quality is limited, indicating that outcome-based estimates supplement existing process and structure indicators.
Although our findings suggests that nursing homes vary in terms of outcomes, the imprecision in the point estimates is large compared to the observed differences. This means that even when using detailed data and noise-reducing methods like empirical Bayes, it remains difficult to measure variation in outcomes of small-scale providers. As a consequence, differentiating between nursing homes based on performance on outcomes, for example when benchmarking and in pay-for-performance schemes, should be executed with caution, especially when considering the non-extremes.
Our results imply that the observed variation in nursing home outcomes unbiasedly predicts variation in causal mortality and avoidable hospitalization performance. This is important, as it means that case-mix adjustment based on observable characteristics is sufficient for measuring nursing home performance based on outcomes, at least on average. However, selection bias may still be an issue in other settings: this study only includes non-profit nursing homes and things may be different in other institutional settings, e.g. for-profit nursing homes may have stronger incentives to attract healthier clients (Gandhi, 2020).

26
Also the exclusion of nursing homes with fewer than 50 admissions is important: since our descriptive results show that there is at least some selection into larger vs smaller nursing homes based on observables, there might also be selection based on unobserved characteristics into these smaller homes.
While our results may be seen as reassuring with respect to selection at the aggregate level, they do not imply that observed performance is unbiased for every nursing home separately. Any selection bias due to unobserved heterogeneity may cancel out if it happens to be in the negative direction for some nursing homes and in the positive one for others. While the average bias could then be zero, it may still result into misclassification of certain high performers as some of the lowest, as was observed in the case of hospitals (Geweke et al., 2003;Gowrisankaran and Town, 1999;Hull, 2020). However, given that estimating a bias for individual entities requires sufficient statistical power, this investigation was not feasible in our setting with small providers. The question of whether case-mix adjusted performance on outcomes can be used to promote quality improvements through pay-for-performance incentives remains thus an open question.
All in all, our results suggest that in designing policies to improve the quality of nursing home care, such as public reporting of quality, the dashboards should be expanded with outcome measures. This is especially important in the long-term care sector for which expenditures are expected to grow but for which information on outcomes is limited. Having such additional information is useful, both for nursing homes themselves to identify where improvements may be achievable, and for users aiming to make more informed choices. As our findings suggest that there is no detectable selection bias at the aggregate level, directing users to the nursing homes with the best observable case-mix adjusted outcomes could generate positive benefits in terms of health.   This table reports whether the hospitalization within 180 days after nursing home admission was an outpatient, a one-day inpatient stay or an overnight inpatient stay. It includes individuals who had at least one hospitalization within 180 days after nursing home admission and separates by those of which at least one of them was potentially avoidable or not. Care intensity A care intensity package (in Dutch: Zorgzwaartepakket (ZZP)) is a proxy for the intensity of nursing home care that the recipient needs according to an independent care assessor from the Care Assessment Centre (CIZ). Residents with lower care (ZZP 4) intensity need intensive support and extensive care, with dementia care (ZZP 5) need a protective living facility with intensive dementia care, with higher care intensity need a protective living facility with intensive support and care, with the highest care intensity (ZZP 7 and 8) need a protective living facility with very intensive care and treatment or support (CIZ, nd).

Healthcare expenditures
Yearly healthcare expenditures in logarithmic form from the calendar year before admission, obtained from claims data.

Wealth
Wealth from assets and savings (excluding home equity) of the household in the calendar year prior to admission. Variable enters the control function in logarithmic form.

Standardized household income
Yearly disposable household income in the calendar year before admission from tax registries, standardized for household size. Variable enters the control function in logarithmic form.

Number of medicine
Is equal to the number of different types of medicine resident i received in the calendar year before admission. Types are distinguished based on ATC3 codes. The Charlson comorbidity index is a weighted score based on 12 different comorbidities: congestive heart failure, dementia, pulmonary disease, connective tissue disorder, liver disease, diabetes, diabetes complications, paraplegia, renal disease, cancer, metastatic cancer, severe liver disease and HIV (Quan et al., 2011). Diagnoses are obtained from hospital visits in the 365 days before nursing home admission.

Hospital in last month
An indicator for whether the individual had an hospitalization within the last 30 days before the individual were admitted to a nursing home.

Rural
This binary variable indicates whether the resident's prior home was in a rural (equals 1) or urban (equals 0) municipality. We categorize a municipality as urban if the area is at least moderately urbanised (> 1, 000 addresses per square kilometer) (Statistics Netherlands, nd).
Year Indicates in which year the individual is admitted to a nursing home. For this exercise we distinguish between four groups: (1) clients with a lower level of care intensity; (2) clients with a higher level of care intensity; (3) clients with the highest level of care intensity; and (4) clients with dementia care needs. These are identified by the care intensity package, as explained in Footnote 6 and Appendix Table B4. For each of these groups, we regress the individual level outcome -mortality or avoidable hospitalization -on the observed performance score of the nursing home to which the individual is admitted (δ ij ). Standard errors between brackets. * * * Statistically significantly different from zero at 1 percent; * * at 5 percent; * at 10 percent.  This figure shows Kaplan-Meier survival curves for time to death (mortality) and time to an avoidable hospitalization by care intensity package. Time to death for those who enter the nursing home with a higher care intensity is, on average, shorter than for those with a lower care intensity. On the other hand, those who have relatively high or dementia care needs, stay longer without having an avoidable hospitalization. This can partly be explained by this group dying earlier.

Figure C2
This figure (nonparametrically) shows the relationship between performance of the chosen nursing home (i.e. the endogenous variable) and performance of the closest nursing home (i.e. the instrumental variable) for both outcomes: mortality (a) and avoidable hospitalization (b). The size of the data points reflect the group size on which the average is based.