Do incentives matter when working for god? The impact of performance-based financing on faith-based healthcare in Uganda

Can extrinsic incentives motivate faith-based healthcare providers? This paper challenges the finding that religious providers are intrinsically motivated to serve (poor) patients, and that extrinsic incentives may crowd-out such motivation. We use a unique panel of output and expenditure data from small faith-based nonprofit healthcare facilities in Uganda to estimate the effect of introducing performance-based financing. The output of the observed facilities is less than 50% of their potential. Performance-based financing increases output and efficiency robustly by at least 27%, with no apparent reduction in the perceived quality of services. Religious nonprofit healthcare providers may well be intrinsically motivated, but respond positively to extrinsic incentives. Whether working for god or not, incentives matter.


Introduction
Public healthcare systems in many developing countries suffer from severe dysfunctionalities and endemic absenteeism: among public health workers in Uganda, unauthorized absence from duty may be as high as 50% (Björkman & Svensson, 2009). One way to improve the delivery of health services is to allow competition among different providers, regardless of their ownership status, guided by a principle of non-discrimination in the allocation of resources. This calls for a shift of responsibilities to the private and nonprofit sector. In fact, private healthcare represents a large share of health provision around the world, and this share is even greater among the poorest and most vulnerable. According to the World Health Organization Global Health Expenditure database, private healthcare providers account for 60% of health spending in low-income countries (Walton & Matthees, 2017). While for profit enterprises are growing rapidly, especially in urban areas, a large share of them is still faith-based: they are so called religious nonprofit organizations (RNPOs). In Uganda, 82% of all private nonprofit health facilities are coordinated by one of three faith-based organizations: the Uganda Protestant Medical Bureau (UPMB), the Uganda Catholic Medical Bureau (UCMB), and the Uganda Muslim Supreme Council (UMSC)-with a far greater share among smaller dispensaries in rural areas (Reinikka & Svensson, 2010). Since 2000, the Ugandan government initiated a program in which every nonprofit primary health unit received an untied grant to help them offer their services. In a seminal paper, Reinikka and Svensson (2010) show that RNPOs responded to this unconditional surge in resources by increasing output. They interpret this to be consistent with the view that religious nonprofit providers are ''working for God", and thus intrinsically motivated and nonopportunistic.
Given growing budget pressures in many countries, and growing frustration with the lack of progress engendered by standard funding practices, a different approach to increasing healthcare output is recently becoming more popular: setting incentives that make the amount of funding a healthcare provider receives conditional on performance. There is increasing body of evidence that Performance-Based Financing (PBF), as this approach is often referred to, can increase both output and efficiency of healthcare in developing countries if the incentives are clear and well designed (Brenzel, 2009;Bhatnagar & George, 2016;Eldridge & Palmer, 2009;Hecht, Batson, & Brenzel, 2004;Honda, 2013;Novignon & Nonvignon, 2017). So far nonetheless, most rigorous studies on PBF have focused on public and private healthcare facilities, and did not investigate the heterogeneity of outcomes across sectors (Banerjee, Glennerster, & Duflo, 2008;Basinga et al., 2011;Bonfrer, Van de Poel, & Van Doorslaer, 2014;Bhatnagar & George, 2016;Meessen, Kashala, & Musango, 2007;Morgan, 2010 ;Sekabaraga, Diop, & Soucat, 2011). 2 Do intrinsically motivated RNPOs respond to extrinsic incentives in the same way as public sector? Or do extrinsic incentives erode the intrinsic motivation inherent to religious nonprofit healthcare outfits, potentially going as far as reducing their efficiency and quality of service?
We use a panel dataset from Uganda spanning a period of thirteen years and up to 246 small-to mid-sized health units belonging to the UCMB to estimate the effects of introducing PBF on healthcare output and -by extension -on the efficiency of their healthcare service delivery. We first analyze the data using data envelopment analysis (DEA) and stochastic frontier analysis (SFA)-standard approaches in healthcare production studies, both estimating the degree of inefficiency compared to some optimal benchmark frontier. Frontier efficiency measurements are common in studies focused on healthcare provision in developed countries, but -to the best of our knowledge -we are the first ones to apply them in a PBF evaluation in the context of a developing country. Next we estimate the parameters of a production function by means of a regression analysis and a more general parametric approach, using a dynamic version of the generalized method of moments (system-GMM). In this case, PBF can be seen as a new technology which shifts the whole production frontier. We find that the output of the observed facilities is less than 50% of their potential. Also, performance-based financing increases output and efficiency robustly by at least 27%. By conducting an independent client satisfaction survey we also show that this comes at no expense of the perceived quality of services provided. Jointly these results point towards a RNPO sector that is both responsive to extrinsic incentives and in dire need for increased efficiency. Whether working for God or not, incentives seem to matter-and can help deliver more healthcare services at a lower cost.
The remainder of the paper is structured as follows: in Section 2 we provide the background for the study, focusing on the latest literature on PBF and a description of the healthcare system in Uganda. Section 3 describes the data, while the methodological approach is outlined in Section 4. We present the main results on efficiency in Section 5 and additional results on perceived quality of service in Section 6. In Section 7, we discuss the results and conclude.

PBF: one policy, many faces
The growing evidence that the health of its population is an important determinant of a country's economic growth (Bloom & Canning, 2000;Weil, 2007) has provided an additional argument -besides the ethical ones -for the need for functional and accessible healthcare provision. In contrast to the traditional perception that healthcare provision is essentially a function of structural inputs (including people, infrastructure, knowledge, drugs, material, equipment, and technology), the PBF paradigm focuses on the processes transforming these inputs into outputs (Eichler, 2006). Though usually thought of as complementary, the right processes can -to an extent -make up for the lack of inputs (Peabody, Tozija, Munoz, Nordyke, & Luck, 2004). By improving the transforming processes, more output can be produced using the same limited inputs.
In recent years, PBF has become one of the favorite ways to stimulate such improvements (Brenzel, 2009;Eldridge & Palmer, 2009;Hecht et al., 2004;Honda, 2013). However, while the principal-agent problem has been successfully reduced by conditioning payment on performance in many other professional contexts (Miller, 2008;Zhao, 2005), it is rather difficult in processes with such multi-dimensional output as healthcare. While some level of agreement on best practices (increasingly grounded in economic theory and based on achieving specific, measurable, attainable, relevant, and time-bound -or SMART -indicators) has emerged over the past years (Fritsche, Soeters, & Meessen, 2014), the way in which various PBF components affect the multiple dimensions of healthcare delivery is still not fully understood (Renmans, Holvoet, Orach, & Criel, 2016;Renmans, Holvoet, Criel, & Meessen, 2017b).
Even if performance is understood in its most limited sense as output, thus excluding quality considerations, the many different types of output produced by a healthcare provider have to be taken into account when assessing its performance-either individually or according to some conversion logic. Expanding the notion of performance to include the quality and relevance of produced output, which are hard to quantify in a single metric, complicates the matter even further, and there is an ongoing debate on how best to measure these aspects of healthcare production. Despite these challenge, recent PBF schemes have now incorporated quality indicators in their design, typically based on checklists of observable structural and process measures (Josephson et al., 2017). 3 Such designs usually pay for output conditional on quality in a setting where the principal contracts the health facilities and the management of these providers then contracts the staff.
While this general framework is becoming commonplace, PBF programs operate in their specific settings, and many try to experiment with innovations, making each design unique. In our case, incentive payments are determined at the facility level, and make up only a fraction of total facility income with capped incremental bonuses. The allocation of the bonus payments is at the discretion of the in-charge of the facility, and typically redistributed to employees. 4 Several studies document positive effects of PBF, at least on public healthcare delivery (Basinga et al., 2011;Bonfrer et al., 2014;Meessen, Musango, Kashala, & Lemlin, 2006;Meessen et al., 2007;Soeters et al., 2011;Sekabaraga et al., 2011). Others find no lasting effects (Banerjee et al., 2008;Morgan, 2010;Turcotte-Tremblay, Spagnolo, De Allegri, & Ridde, 2016); characteristically, when incentives do not trickle down to individuals in one way or another, or if other PBF design feasibility criteria are not met.
Existing literature, however, also identifies several potential pitfalls of PBF. Oxman and Fretheim (2008) warn against the danger of widening the already existing gap between poorly-and wellperforming facilities, which may lead to an increasing gap in access to quality healthcare. Other concerns include the risk of increased gaming, i.e. systematic reporting bias (Kalk, Paul, & Grabosch, 2010;Kalk, 2011;Lu, 1999), target-led distortions resulting in the production of services with negative marginal value (Wynia, 2009), and cherry-picking of patients who are most suited to achieve targets (Ireland, Paul, & Dujardin, 2011). 5 Finally, direct 2 Even though some of these studies do include a share of faith-based nonprofit facilities, they do not investigate the differences that may arise from variations in intrinsic motivation.
3 While indices of structural and process measures are now probably the most common way in which PBF schemes promote quality of healthcare production, other approaches have also been employed. The PBF program described by Soeters et al. (2011) for example tries to ensure quality maintenance through comprehensive agreements with providers and regulators, and measure it through patient-perceived quality surveys that do not directly influence bonuses, and quality reviews done by peripheral health authorities at primary level or through peer group reviews at hospital level. 4 This has since been standardized in most PBF designs by the indices management tool, which uses a group evaluation system rather than one where only the in-charge decides on the individual bonus payments of staff. 5 These unintended consequences may be a sign that the PBF design offers too strong incentives. To help diminish this concern it is important to regularly review targets and incentives, as is the case in our setting.
financial incentives, as well as the bureaucratization of healthcare delivery, which is -to an extent -necessary to implement a PBF program, may end up crowding out intrinsic motivation (Frey & Jegen, 2001;Ireland et al., 2011;Kalk, 2011), inducing a decline in physician professionalism and morale (Wharam et al., 2009). The importance of intrinsic motivation in healthcare provision is clear (Brock, Lange, & Leonard, 2016;Leonard & Masatu, 2010;Reinikka & Svensson, 2010), and the theory behind extrinsic incentives crowding it out is well established (Deci & Ryan, 1985). However, its applicability to PBF is still subject to discussion (Lohmann, Houlfort, & De Allegri, 2016), and empirical evidence of the effects of PBF on intrinsic motivation (which could be specific to each PBF scheme) remains inconclusive (Bertone, Lagarde, & Witter, 2016;Bhatnagar & George, 2016;Chimhutu, Lindkvist, & Lange, 2014;Lohmann et al., 2018;Shen et al., 2017). Many of these pitfalls, and especially the last one, acquire particular salience when embedded in faith-based, mission-driven, nonprofit settings. If PBF crowds-out intrinsic motivation it may even have the opposite effect to the desired one.

Setting and intervention
The health sector in Uganda is characterized by a high degree of fragmentation with a mixture of public, private nonprofit and private for-profit healthcare providers (Björkman & Svensson, 2009), with government health spending representing only about a quarter of the total in most years since 2000 (World Health Organization, 2018). Although the Ugandan Ministry of Health takes non-governmental providers into account in its planning, providing partial funding to some of them, private health facilities and practices account for half of Uganda's reported healthcare output and operate independently of public ones (Governemt of Uganda -Ministry of Health, 2010). The policies governing healthcare services in the country are consequently as diverse as the service providers. In this complex situation, the Ugandan Ministry of Health piloted a large-scale PBF program for healthcare providers, which mostly turned out to be a failure due to a number of design problems (Morgan, 2010;Ssengooba, McPake, & Palmer, 2012). Other PBF programs have since been implemented in the country, including one which involves RNPO-run facilities (Renmans, Holvoet, & Criel, 2017a), but these have not yet -to the best of our knowledge -been comprehensively evaluated.
In this paper, we focus on one of the largest nonprofit private healthcare providers in the country -the Uganda Catholic Medical Bureau (UCMB). The UCMB runs an extensive country-wide network of hospitals and health centers accounting for over a third of private healthcare facilities in Uganda (Governemt of Uganda -Ministry of Health, 2010). The structure of the administration of the UCMB healthcare facilities mirrors that of the Catholic church itself: each of the 15 dioceses has its own health office, which is responsible for the operation of the health units within its territory. The central UCMB office coordinates the diocesan health offices, sets policy on the national level, monitors and evaluates the diocesan offices and the individual health units, and represents them nationally and internationally.
In 2008, the UCMB selected the diocesan health office in Jinjaone of the smaller dioceses -to pilot a PBF scheme in its six health centers to test the practical feasibility of this new approach before possibly extending the scheme to all its health units. 6 The Diocese of Jinja was selected for its manageable size, for its long cooperation with Cordaid, an international NGO which helped set the scheme up, and for its historically good performance relative to other dioceses.
The scheme was introduced in the 2008/2009 fiscal year. It was supported by Cordaid with financial, material, and technical assistance, 7 but implemented by the diocesan health office in Jinja. For each health unit, the scheme set two long-term output targets based on its performance in the previous 3 years, in consultation with the unit's management. The targets are defined in terms of the Standard Unit of Output (SUO) -a weighted average of the most commonly performed procedures and services, taking into account their relative input requirements in terms of cost and time: where outpatients, inpatients, deliveries, ANC, and immunizations are the numbers of outpatient visits, inpatient admissions, deliveries, ante-natal care visits (including family planning), and immunizations respectively. Starting in fiscal year 2009/2010, each of the participating facilities received a financial bonus conditional on reaching the lower pre-specified target, and incrementally increasing up to its maximum if the upper target was reached. 8,9 The realized bonuses increased the income of participating facilities -about a third of which comes from user fees and the remainder from government grants -by about 5% on average. Although targets and bonuses were set at the unit level, the heads of the units typically used at least some of the extra income to top up the salaries of their employees proportionally to their hours worked, thus bringing individual incentives in line with those at the facility level.

Panel data
To gauge the effect of PBF on the efficiency of healthcare delivery, we use a range of output and input measurements collected by the UCMB from all its health units, which amount to a panel spanning up to 246 mid-sized health units over a period of thirteen years (fiscal year 2001/2002 -fiscal year 2013/2014). The last six fiscal years follow the introduction of PBF in the treated centers.
The data are compiled by individual health facilities, which report them to their superior diocesan health office. The diocesan health offices verify the data from the ledgers of randomly selected facilities on a monthly bases, and send it on to the UCMB, which uses the same random selection system to verify the data from the health offices. This data is collected for internal monitoring purposes and to guide decision making and as an advocacy device. UCMB engages in monthly data verification activities and omits any data that it has reason to consider unreliable.
Output is measured using the Standard Unit of Output (SUO, see Section 2.2 for details). Total expenditures are measured in millions of 2014 Ugandan Shillings (USh.). Panel descriptive statistics for 6 One of the health centers dropped out of the scheme a few years after its introduction due to unspecified issues. Preferring to err on the cautious side, we treat it as under PBF for the remaining years as well. The PBF scheme has since been extended to government-run health centers in the same area to test its feasibility in a public setting. Since then, it has not -to the best of our knowledge -been expanded further. 7 Cordaid covered the financial costs of the scheme, offered technical support and training, and -where missing -provided IT equipment. 8 Technically, output was only one of 5 aspects of performance incentivized by the PBF scheme. The others were equity (measured in user fees per SUO), efficiency (measured in cost per SUO), productivity (measured in SUO per number of qualified staff), and quality of service delivery (measured in terms of qualified staff per total staff, combined with results of drug prescription and patient satisfaction surveys). However, the targets set for these indicators were generally low enough to be met in nearly all cases. This leaves the output targets as the only ones practically affecting the size of the PBF bonuses paid out. 9 This PBF design is rather uncommon in two ways: As recommended by Fritsche et al. (2014), PBF programs typically weigh outputs in monetary terms. More importantly, most do not involve output targets, but rather reward each unit of output-the notable exception being a PBF initiative in Malawi focused on maternal and newborn health (Lohmann et al., 2018).
SUO -as well as for the other two main factors of production, i.e. the total number of staff (staff ) and capital proxied by bed capacity (beds) -are presented in Table 1. Fig. 1 shows the trends in output of PBF and control facilities. 10 It plots the average output for PBF and control facilities after standardizing each facility's output at 100 at the beginning of the observed period. The dotted vertical line marks the introduction of the PBF scheme.

Empirical strategy
Most analyses of healthcare production typically involve data envelopment analysis (DEA) or stochastic frontier analysis (SFA) approaches (Hollingsworth, 2008). Each approach has its advantages as well as drawbacks. DEA does not require any distributional assumptions nor to impose a functional form on the production technology, but it is very sensitive to outliers and measurement error due to its non-stochastic nature. This is a problem especially in the context of developing countries where the reliability of data may be in question. SFA methods are considerably less sensitive to outliers and measurement error than DEA, but require imposing assumptions on the distribution of efficiency or the error term and about the form of the production function. In what follows, we use both approaches to model the production of healthcare as a function of three groups of inputs: expenditures, labor (number of staff) and capital (proxied by the number of beds). Complementarily, we present the results of a parametric analysis, starting from a pooled ordinary least squared (OLS) model, and gradually relaxing simplistic assumptions, correcting the initial estimate to include time and facility fixed effects, first-order autocorrelation, and ultimately modeling healthcare production as a dynamic process (Scott & Coote, 2010).

Data envelopment analysis
The DEA approach, where efficiency scores are obtained nonparametrically through linear programming methods and regressed on facility characteristics, is popular because its nonstochastic first stage does not require one to make any assumptions about the functional form of production technology. Instead, the efficiency factor h is defined as the output of each facility -or decision-making unit (DMU) in the parlance of DEA -relative to the output of a virtual facility with the same levels of input, which in turn is a linear combination of the most efficient facilities in the dataset. As such, h is a facility's current output expressed as a fraction of its maximum potential output given the current levels of inputs and maximum efficiency. Formally, we solve the following linear program for each facility in each fiscal year: Notes: Means (with standard deviations in parentheses) are reported for SUO, expenditures, staff , and beds. a The panel has considerable gaps, leading to differences in how many of the six PBF facilities data is available for from year to year. This is also the case for control facilities.
However, most of the unbalancedness of the panel stems from an expansion of the dataset through time, and to a much lesser extent through attrition and time-series gaps. Our analysis addresses this issue.  x i À X 0 k P 0; where 0 6 h 6 1, SUO i is the output of the i-th facility, x i is a vector of its inputs (expenditures, beds, staff ), SUO is a vector of outputs of all the N facilities in a given year, k is an N Â1 vector of constant weights, and N is a NÂ1 vector of ones.
In the second stage, we estimate the effects of the introduction of PBF on h, using the double-bootstrapped maximum likelihood truncated regression procedure proposed by Simar and Wilson (2007) to correct for biases arising from within-facility correlation of efficiency: where PBF it equals 1 if the i-th facility had a PBF program in place in fiscal year t and 0 otherwise, d i are spatial fixed effects on the diocese level, and s t are fiscal year fixed effects. We use code adapted from Wolszczak-Derlacz and Parteka (2011) to estimate the second stage.

Stochastic frontier analysis
The deterministic nature of DEA gives the first stage of this approach the advantage of not necessitating any functional or distributional assumptions, but it also renders it extremely sensitive to outliers. The issue is further aggravated in developing country contexts like ours, where data collection methods are often suboptimal, making the data exceptionally noisy. SFA -the most common alternative to DEA -is considerably less sensitive to outliers and noise in the data, but does require an explicitly defined production function. To allow for sufficient functional flexibility, we use a translog production function with expenditures, number of staff and capital (proxied by bed capacity) as factors of production: where factors it is a vector representing a translog production function with expenditures it (total expenditures in millions of USh.), staff it (total number of staff), beds it (bed capacity proxying for capital) as production factors, u it is measure of inefficiency (a Euclidean distance from the estimated production frontier) with a truncated normal distribution, and v it a normally distributed stochastic error term. Following Battese and Coelli (1995), we use a maximum likelihood random effects model to estimate the impact of PBF on u it : where w it is a normally distributed stochastic error term and the rest of the notation is the same as above, so that:

Panel regression analysis and dynamic system GMM
To ease potential concerns about the assumptions and drawbacks of both DEA and SFA, we also present the results of a parametric analysis, regressing output on PBF and the equivalent of the same translog production function as in (4). We then correct a bias due to the presence of facility-level fixed effects by estimating a fixed-effects (FE) model, and we include fiscal year fixed effects to produce the following fully specified model: where PBF it is a dummy equal to one if PBF is in place in the given facility and fiscal year. factors it is a vector representing a translog production function with expenditures it (total expenditures in millions of USh.), staff it (total number of staff), beds it (bed capacity as a proxy for capital) as production factors. g i are facility-level fixed effects, s t are fiscal year fixed effects, and e it is a stochastic error term. Although we technically estimate the effect of PBF on output rather than on allocative efficiency, it has no consequence for the magnitude and statistical significance of the estimated effect b since all factors of production enter on the right-hand side of the regression equation. Substituting allocative efficiency for output on the left-hand side would only affect the values of c, leaving b unaffected.
It can be expected that performance in one period is heavily influenced by performance in previous time periods (Scott & Coote, 2010). We test for serial autocorrelation following Wooldridge (2002), and verify the assumption of stationarity using a series of augmented Dickey-Fuller (ADF) tests such as the Fishertype test suggested by Choi (2001) and that of Im, Pesaran, and Shin (2003). We then reestimate the model using a three-step feasible generalized least squares estimator (FGLS) to correct for autocorrelation.
Following Roodman (2009), our final estimator is a panel-robust two-step Blundel-Bond system GMM with forward orthogonaldeviations, both first differences and levels of the independent variables as standard instruments, and a GMM-style instrument that collapses all available lags of the lagged dependent variable for each time period into one moment. This allows for correcting both autocorrelation and imbalances in the panel. Dynamic panel data analysis helps reduce -if not resolve -key econometric problems often arising from empirical studies that use conventional cross-sectional or time-series datasets. The large number of data points increases the efficiency of econometric estimates and, by utilizing information on both the inter-temporal dynamics and the individuality of the entities being investigated, it better controls for the effects of missing or unobserved variables (Hsiao, 2003). Also, by following facilities over a 13-year time span, we can construct a proper recursive structure to study the beforeafter effect, addressing concerns over the short-lived nature of PBF-induced increases in performance (Maynard, 2012;Oxman & Fretheim, 2008).

Results
We estimate the effect of PBF on productivity using the traditional two-stage DEA and SFA approaches. The linear program in the first DEA stage produces largely varied efficiency scores h across facilities and years with mean value l h ¼ 0:439 and standard deviation r h ¼ 0:240. In words, the health facilities have on average been producing only 44% of the output that they could potentially produce, given their inputs and the efficiency of the best-performing units (similarly estimated efficiency scores of healthcare facilities in developed countries, as reported by Hollingsworth (2008) andO'Neill, Rauner, Heidenberger, &Kraus (2008), typically fall between 80% and 90%). In the second double-bootstrapped maximum-likelihood stage, we estimate that the introduction of PBF increases efficiency h by 20.1% points (i.e. by 45.8%) -or almost one standard deviation (Table 2, column 1).
Though estimated simultaneously, the SFA approach can effectively be thought of as a two-stage process which first estimates a production frontier corrected for inefficiency u, and then regresses u on PBF. We estimate that the introduction of PBF reduces the initial u of 0.817 by 0.476 (Table 2, column 2). Since the frontier is modeled as a translog function, these estimates require a simple arithmetic manipulation to be directly comparable to our linearly estimated DEA results: The inefficiency u of 0.817 is equivalent to an efficiency h of 44.2%, and the PBF-induced reduction in u by 0.476 is equivalent to an increase in h of 26.9% points (i.e. 60.9%).
Result 1. The efficiency of healthcare facilities in our sample is on average 44%, or about half that of health facilities in developed countries.
The SFA procedure is more robust to noise in the data than DEA, but both techniques are likely to bias the estimated gains in efficiency (46-61%) upwards due to their static nature. We thus continue investigating the effect of PBF parametrically. We start by presenting a naïve pooled OLS model (Table 3, column 1). A Wald test confirms the presence of facility-level fixed effects (Fð215; 1677Þ ¼ 5:07 is significant at the 1% level), and a clusterconsistent Hausman-type test following Arellano (1993) reveals that a random effects model would be inconsistent (v 2 ð10Þ ¼ 28:251 is significant at the 1% level). We additionally control for a possible time trend by including fiscal year fixed effects (Table 3, column 2). A Wald test confirms the joint significance of fiscal year fixed effects (Fð12; 215Þ ¼ 12:65 is significant at the 1% level). Following Wooldridge (2002), we reject the null hypothesis of no first order autocorrelation in the panel (Fð1; 192Þ ¼ 61:532 is significant at the 1% level). We verify the assumption of stationarity through a series of ADF tests, all of which reject the null hypothesis of non-stationarity at the 1% significance level. To correct for first-order autocorrelation, we therefore estimate the model using an FGLS estimator (Table 3, column  3).
These parametric models produce large, significant effects of PBF on output which, apart from the naïve OLS result in column 1, are very much in line with the estimates of our previous DEA and SFA (49-51%). However, so far our estimates are static, and do not account for the dynamic nature of the data-output last year is likely to be a fundamental determinant of output in the current year. Our final and favored estimator is therefore a system GMM (Table 3, column 4). The magnitude of the estimated effect of the introduction of PBF on healthcare output decreases drastically to 27%. Its statistical significance also decreases in the process, but always remains below the 5% level. This is consistent with Scott and Coote (2010), who assert that healthcare provision is an inherently dynamic system, with current output and efficiency predicting future ones. Importantly, the increase is systematic and similar across individual indicators. Columns 1 to 3 of Table 4 show that output in terms of outpatients, inpatients, and immunizations increases significantly in a similar fashion to the overall effect on the SUO. Column 4 reports the results for deliveries. Also in this case the coefficient is very consistent with the other estimations. However, as can be seen from the table, this estimation fails the Arellano-Bond test for AR(1) in first differences, as well as the Hansen test for instrument exogeneity. This implies that the moment conditions used in column 4 are not valid, and that the results should therefore be interpreted with care. 11 Taken together, the findings of Table 4 indicate that the expansion of total output did not come as a trade-off between different outputs, but rather as a shift in the total amount of medical services performed. 12 Result 2. The introduction of PBF robustly increases output through improved efficiency of healthcare provision by at least 27%, or over a third of a standard deviation.

Perceived quality
A 27% increase in output is far from modest, especially in light of the fact that it is due to a financial incentive scheme worth only about 5% the total income of the participating facilities, and that we control for the increase in budget. Therefore, it is quite plausible that such a sudden rise in output could come at the expense of quality. For instance, to maximize revenue, health staff could perform the services in a hasty manner. Such response to the incentives would suggest that there is an opportunity cost in terms of quality of service. If the staff are aware of their implicit decision, this could indicate an erosion of professionalism and intrinsic motivation. To see whether this might be at stake, we conducted a two-wave patient satisfaction survey in the PBF facilities and in a sub-sample of those without PBF.
While patient satisfaction is subjective, several studies endorse its validity as an instrument for measuring quality of healthcare (Andaleeb, 2001;Davies & Ware, 1988;Johansson, Oléni, & Fridlund, 2002) and it has recently been placed at the core of policy recommendations regarding PBF in the United States (Ryan, 2009;Wharam et al., 2009). Leonard (2008) shows that satisfaction is jointly produced with quality during the course of a consultation and that patients respond to increased quality by being more likely to be satisfied. Moreover, patient satisfaction reflects both process quality and clinical quality (Marley, Collier, & Meyer Goldstein, 2004), making it a good measure of the overall quality of healthcare delivery. This said, increased efforts by health facilities to improve the experience of patients and persuade them to come back (an intermediary variable in the PBF theory of change) may weaken the correlation between patient satisfaction, perceived quality and actual quality of service in our setting. In other words this analysis should be seen as an effort to assuage the concern that patient care was neglected in favor of output growth, rather than as evidence of changes in intrinsic motivation.
We use the output and expenditure data along with several other characteristics to match the six PBF facilities with six similar ones receiving fixed funding, identified among other facilities in linguistically and culturally affine areas. The matches are based on a propensity score calculated from a set of indicators not constituting a building block of the SUO (the facilities' income and expenditures, catchment population, number of staff, bed capacity, number of carried out diagnostic procedures and minor surgical  (9) Notes: (Cluster) robust (bootstrapped) standard errors in parentheses. ⁄ p < 0:10, ⁄⁄ p < 0:05, ⁄⁄⁄ p < 0:01. 11 An alternative dynamic estimate that allows for first-order moving-average MA (1) errors solves this problem and reveals quantitatively similar results. 12 We do not report the effect of PBF on antenatal care visits as our model presents strong evidence against the null hypothesis that the overidentifying restrictions are valid (although the Sargan test has a tendency to underreject). Rejecting this null hypothesis requires to reconsider the model or instruments, and we could not find suitable substitutes.
operations, average length of stay of admitted patients, number of fatalities, and the availability of mental counseling), as measured in the last year prior to the introduction of the scheme. By matching on values collected before the implementation of PBF, we ensure that the propensity scores are not influenced by any potential confounding effects of the intervention.
To gauge the perceived quality of the services in these facilities, we adapted the Hospital Consumer Assessment of Healthcare Providers and Systems (HCAHPS) survey (Centers for Medicare & Medicaid Services, 2011) to the Ugandan environment and administered it in two waves in 2012 and in 2014. 13 To address concerns that perceived quality might be highly dependent on the relative differences vis-à-vis the nearest available alternatives, each of the units was further matched with the nearest similarly-sized public facility as well as a village half-way between the catchment areas of each private-public pair, resulting in a final sample of 24 facilities and 12 neighboring villages.
In the villages, we randomly selected 12 adult respondents from a previously recorded household census. At the facilities, accidental sampling was used instead, interviewing the first 10 people exiting the facility on an unannounced day. In total, 384 interviews were carried out in each wave. Excluding respondents who had not visited the catholic facilities in the three years prior to the interview -resulted in 430 non-incidentally truncated interviews. From answers to a set of questions regarding various aspects of  Notes: Robust SE in parentheses. ⁄ p < 0:10, ⁄⁄ p < 0:05, ⁄⁄⁄ p < 0:01. Inputs: expenditures, staff , capacity. To avoid proliferation of instruments in the system GMM model, we take the spatial fixed effects one level up within their nested structure, using diocese-instead of facility-level fixed effects.
perceived quality of service, we factor out an index of perceived quality (quality), scaled from 0 (most dissatisfied) to 1 (most satisfied). Other personal-level confounding characteristics measured through the survey include a physical health index (health) based on the SF-12 Health Survey as proposed by Ware, Kosinski, and Keller (1996), an asset index (assets) obtained by principal factor analysis (Sahn & Stifel, 2003) approximating wealth, education (primary) -a dummy equal to one if the respondent completed primary education, sex (female) -a dummy equal to one if the respondent is female, and age (age) in years. Finally, govt is the mean perceived quality of the matched state-owned facility measured in the same way as quality. By using the mean perceived quality of the main public facility rather than the quality perceived by each respondent, we de facto control for the relative reputation of the public competitor. As can be seen in Table 5, the sample was well balanced across all the confounding characteristics (except for govt) in 2012. Lacking a baseline survey, we cannot estimate the effect of the introduction of PBF on the perceived quality of healthcare provision in a difference-in-differences (DID) setting. We can, however, use the DID approach to compare trends in PBF and non-PBF facilities after the new financing system was introduced and after the management and staff of the facilities had presumably gotten accustomed to it.
A meaningful measure of perceived quality of the health facilities can only be obtained from respondents who had recently received treatment there. This can potentially introduce a serious self-selection bias. Before proceeding to a full DID model specification for comparing the trends in perceived quality of healthcare delivery in PBF and non-PBF facilities, we therefore first estimate a bivariate sample-selection model as proposed by Heckman (1979) to check for such bias. We then estimate the PBF trend effects using a simple DID model with cluster-robust standard errors on the sub-sample of non-incidentally truncated interviews, gradually adding covariates until reaching the following fully specified model: where FU is a dummy equal to one for all observations from the 2014 follow-up, X is a vector of production factors proxying the size of the private facility (expenditures, staff and beds), Z is a vector of respondent characteristics comprising health (a physical health index), assets (an asset index approximating wealth), primary (a dummy equal to one if the respondent completed primary education), female (a dummy equal to one if the respondent is female), and age (respondent's age in years). govt is the mean perceived quality of the nearby state-owned health facility, g are location type fixed effects indicating whether the interview took place at a private facility, a public facility, or a village in between the two, and e is a stochastic error term.
In Table 6 we first estimate a Heckman bivariate sampleselection model to check for self-selection into visiting the private facilities. While some of the observed respondent characteristics affect the likelihood of visiting the private facility within 3 years prior to the interview, the coefficient on the inverse Mills ratio is statistically insignificant ( Table 6, column 1). In other words, the self-selection bias in the sample does not affect the coefficient estimates in the second stage. It is therefore safe to estimate the PBF trend effects using a simple DID model with cluster-robust standard errors on the sub-sample of non-incidentally truncated interviews. Starting with a naïve specification which only includes the DID terms and size controls (Table 6, column 2), we gradually include respondent characteristics (Table 6, column 3), the mean level of perceived quality of the nearest similarly-sized public facility, and controls for the location and timing of the interview (Table 6, column 4).
The estimates of perceived quality are stable across the specifications. Perceived quality of healthcare increased between 2012 and 2014 in the PBF as well as non-PBF facilities (see coefficients on FU in Table 6). There is also some evidence that the increase was much more pronounced in the case of PBF facilities (see coefficients on PBF Â FU in Table 6). While we do not have quantitative data on staff motivation, we substantiated this result by conducting in-depth interviews with managerial as well as rank-and-file staff in the PBF facilities. These interviews suggest the additional salary bonuses which became possible thanks to PBF made staff feel more appreciated for their efforts. We interpret these results as indicative evidence that PBF did not lead to a decline in quality, and if anything increased the pace of improvements in perceived quality. However as mentioned earlier, this does not allow us to exclude possible negative changes in the performance of the services, either unnoticed by the patients or not captured by our survey instrument.
Result 3. Efficiency improvements resulting from PBF do not reduce the quality of care as perceived by patients.

Discussion and conclusions
Using a panel of output and expenditure data from small healthcare facilities in Uganda, we estimate the contribution of performance-based financing towards achieving greater efficiency in faith-based nonprofit healthcare delivery. This is the first study on PBF to focus explicitly on RNPOs. The main strength of our analysis comes from using a combination of parametric and nonparametric econometric techniques. We rely on a dynamic panel estimation to verify the robustness of more traditional two-stage DEA and SFA procedures-the first time frontier efficiency methods have been use to assess a PBF intervention to the best of our knowledge. We find that the technical efficiency of the health facilities in our sample is on average low at 44%. This number could potentially be underestimated considering the quality of our dataset and the susceptibility of DEA to biased outliers. It is about half of the efficiency typically found in health facilities in developed countries (Hollingsworth, 2008;O'Neill et al., 2008), and somewhat lower than the efficiency scores found elsewhere in Africa (Akazili, Adjuik, Jehu-Appiah, & Zere, 2008;Kirigia, Emrouznejad, Sambo, Munguti, & Liambila, 2004;Masiye, 2007;Renner et al., 2005;Tlotlego, Nonvignon, Sambo, Asbu, & Kirigia, 2010;Zere et al., 2006). Nevertheless, the large performance differences we find between individual facilities in our sample with a standard deviation of 24%, which are smaller than in developed countries (12%, Hollingsworth, 2008), but largely comparable to those elsewhere in Africa (between 23% and 33% [Kirigia et al., 2004]), suggest that there is significant space for improvement. We indeed find just that, observing that healthcare providers respond strongly to targets by increasing output through improved efficiency. In our case, output rose by at least 27%. The result is statistically and economically significant and in line with that of other studies of PBF. Moreover, the increased efficiency does not come at the expense of perceived quality. Efforts to improve the performance of health facilities seem to pay-off egregiously.
Jointly we interpret these results as evidence that faith-based healthcare is far from its productivity frontier, and responds to extrinsic incentives. This does not disprove that faith-based healthcare is intrinsically motivated. Quite on the opposite, we believe that intrinsic motivation plays a vital role in all healthcare provision since the times of the Hippocratic Oath, regardless of ownership status. Our results point precisely to the fact that faith-based non-profit, public and private for-profit outfits are probably not that different after all. Intrinsic motivation may still be important, but we bring evidence that extrinsic incentives have an even greater role to play in increasing overall efficiency. 14 Qualitative interviews with staff in PBF facilities -in line with evidence from several recent evaluations of PBF programs (Olafsdottir et al., 2014;Bertone et al., 2016;Lohmann et al., 2018) -suggest quite the opposite effect of crowding-out: PBF made staff feel appreciated for their efforts, and played an important role in increasing output. If well designed, performance-based financing can help mission-driven healthcare services do more at a lower unit-cost, probably without crowding-out motivation.
Existing literature raises several other concerns about the potential pitfalls of introducing a PBF scheme, all very salient to our context and analysis: Lu (1999) points out that incentivized output targets may encourage facility administrators to overreport output. While this is in general possible, we are confident that such misreporting is unlikely in our case. UCMB datasets have long been used at the national level as an advocacy device, requiring high levels of reliability, and as an internal means of performance comparison across facilities, inducing all facilities to minimize unintended under-reporting regardless of the financing mechanism at play. Moreover, unlike other healthcare providers in Uganda, UCMB engages in monthly data verification and monitoring activities throughout the country. To spot any plausible fallacies in the reporting, it also activated additional post-episode-ofcare verifications at the community level for PBF facilities.
Another potential issue arises from the way in which the targets are aggregated from partial output indicators. If the relative weight given to individual output indicators does not reflect their actual cost in terms of factors of production, an introduction of weighted output targets such as the present one defined in terms of the SUO could distort the balance of provided services, potentially resulting in the production of services with negative marginal value (Wynia,  Notes: Robust SE in parentheses. ⁄ p < 0:10, ⁄⁄ p < 0:05, ⁄⁄⁄ p < 0:01. 14 In fact, while the output increases observed by Reinikka and Svensson (2010) rely on a budget expansion, our results net out the effect of budgetary increases. It is important to stress that any effect we estimate is budget-neutral, in that it removes the effect on output that can be directly derived from the increase in budget. This is especially salient given that the budget available for PBF in our setting may be small relative to other similar interventions. See Soeters (2017) for an overview of PBF standards.
2009). However, the three output indicators which constitute the largest part of total SUO production in observed facilities -the number of outpatients (52% of total SUO production), inpatients (36%) and immunizations (8%) -all rose at roughly the same rate following the introduction of PBF. Further research would benefit from an earlier engagement in data collection on quality of service and intrinsic motivation, to observe actual changes introduced by PBF. Also, a pipeline randomization of the roll-out of the interventions would assuage some of the concerns about non-random placement. Finally, while our study shows that PBF has the potential to increase efficiency of faith-based healthcare delivery substantially, it has little to say about which incentives structures are best suited to do so. Future interventions could vary incentives to test which are most effective in mission-driven healthcare.