Socio-Economic Planning Sciences Environmental sustainability and service quality beyond economic and financial indicators: A performance evaluation of Italian water utilities

As water utilities operate as natural monopolists and they provide essential services for human life, their activities are regulated by public authorities. The sustainable use of water resources and a specific attention on social needs should be essential goals for this kind of firms, so that the evaluation of their business should go beyond their profitability and their financial solvency. Keeping pace with the new Circular Economy paradigm and the evolution of the water regulatory framework, in this paper we suggest a global composite indicator apt to evaluate in a novel way the water utilities performance, encompassing financial and economic indexes together with environmental sustainability and service quality measures. To show its empirical implementation we evaluate the performance of Italian water utilities. The operating context is also under scrutiny focusing on specific water utility features such as size, geographical location, degree of diversification and ownership. In this light, operating in the Centre and being large are considered favourable background conditions, while the South and the medium size display a significant unfavourable influence on the water utility performance. Multi-utilities are more advantaged with respect to the mono-utilities and no significant distinction can be made among the different ownership models.


Introduction
At the beginning of this century, water has been defined the ''blue gold'', so to underline the increasing pressure on water resources. On the other hand, as water is an essential requirement, its affordability should be guaranteed to everybody. This is one of the key goals defined by the United Nations (UN) in the 2030 Agenda for Sustainable Development (Goal 6) together with the necessity of a sustainable management and an efficient use of water as a natural resource (Goal 12). In compliance with Goal 6, UN assembly has proclaimed the period from 2018 to 2028 the International Decade for Action ''Water for Sustainable Development'' (Resolution 71/222). It started on the Water World Day 2018, 1 22 March 2018, and it will end on World Water Day 2028. With this Resolution, UN has established that ''the objectives of the Decade should be a greater focus on the sustainable development and integrated management of water resources for the achievement of social, economic and environmental objectives''. The accomplishment of the ambitious objectives of Agenda 2030 can be eased by the good environmental issues and the water authorities that have entrusted the water management to the water utilities. While in the past, water utilities were just supposed to ensure service provision and to guarantee their economic and financial sustainability, nowadays they must be socially and environmentally responsible too [40]. This clearly comes to light looking at the evolution of the water regulatory system at both national and international level. It is commonly acknowledged that performance evaluation is a key tool for regulators and that different techniques could be used, from models including key performance indicators, such as the scorecards, to models with overall performance indicators derived from financial ratios or efficiency estimations [30]. The definition of the best practices helps the authorities in defining tariffs and quality standard, in monitoring the service provision and in promoting improvements towards several directions, such as investments, environmental impact, customers' satisfaction, cost efficiency (see for example [18,39,41,43,51,54]). According with the changes on the institutional framework of the water sector and the introduction of the new paradigm of the Circular Economy, different approaches in the water utility performance evaluation have to be adopted. As a firm, a water utility has to be evaluated according to economic profitability and financial solvency criteria, exploiting the data available from the balance sheets, assessing the good or bad management in terms of economic and financial sustainability. Although these criteria have a crucial role in the water utility assessment, they are not representative enough to take the whole picture of the water utility performance. The quality of the service and the environmental sustainability have to be considered too. In this light, water regulators are now linking tariffs to the service quality [44,45]. Specifically, the Italian water regulator has also introduced a new penalty based tariff method for those companies that do not achieve the assigned target in terms of service quality [61].
In the literature, the notion of quality has been declined in different ways, namely the quality of the drinking water supplied, the environmental issues, water losses, unaccounted-for water and customers' perspective. Referring to the latter, several variables have been considered. 3 Among them, it is worth citing the number of the customers' complaints, the values of penalties paid for lack of quality service [46] and the unplanned interruptions [45]. Recently, [61] have used as service quality proxies the promised standard the water utilities have undertaken to comply with, for example in terms of target time to reach all the consumers and to guarantee the water provision. From an environmental point of view, water losses is one of the most used indicators (see for example [31] and [26]). Physical water losses along the distribution net are strictly linked to water crisis risk. Even though a certain level of water losses is considered unavoidable, water utilities should make efforts to reduce them. In drought period, wasting water is socially unacceptable. In the end, customers and more generally citizens will pay for water losses [38]. Moreover, according to the International Water Association [32], the reduction of water losses should be a necessary step in the transition towards the circular economy. 4 Many papers are related to the cost efficiency analysis and include either environmental issues or customer satisfaction indicators. In every case, quality matters [53]: an efficiency analysis with no attention to the quality would penalize those water utilities which make economic efforts in providing a better service. It is worth underlying that there are few papers including together economic, environmental and social issues. In this stream, the analysis of the water sector has taken place in a multicriteria framework. To capture the multifaceted aspects of the performance evaluation, global Composite Indicators (CIs) have been proposed and different methodologies to build them have been suggested (see among others [22,40,43,51,52,54]). 5 In this paper, we contribute to the literature in two main different directions. First, motivated by the peculiarities of water sector and by the evolution of the regulatory system, we suggest a holistic approach in the utilities performance evaluation to provide decision makers with a comprehensive tool for benchmarking purposes. Specifically, we propose an innovative use of a composite indicator. Second, we show its practical implementation by giving empirical evidence from the Italian water sector. In compliance with part of the literature on performance evaluation (see [20,36,51,70], among others), we opt for a Benefit-of-the-Doubt (BoD) approach in the construction of the composite indicator. The global index is obtained taking a suitable weighted sum of the single performance indicators. In this case, the weights are chosen in the most favourable way for the water utility under evaluation. Accordingly, no firm can complain about being evaluated in an unfair way and about the choice of the weights. We consider a modified version of the BoD model proposed by Zanella et al. [73] to handle the presence of performance indicators with negative values and the so-called ''undesirable performance indicators'' for which higher values denote lower level of performance (see also [47]). The suggested BoD model allows to include stakeholders' preferences by imposing weight restrictions. Moreover, we consider the robust and conditional version of the outlined composite indicator to mitigate the influence of potential outliers, to directly account for the operating context and to explore its role in the water utility performance [13,15,25,56]. Such a model is applied to 93 Italian water utilities and, as far as we know, this is the first analysis in a global perspective for the Italian case. Beyond the economic and the financial indicators, environmental sustainability and the service quality are taken into account by including water losses, the target time to realize a new connection and the target time to repair ordinary breakdowns. The obtained global index is defined as the Water Utility Performance Composite Indicator (WUP-CI). Finally, the paper gives a contribution to the intense and divisive debate about the influence of the following background condition variables: the size, the ownership, the geographical location and the distinction between mono-and multi-utility. Our analysis shows that operating as a large firm displays a favourable influence on the water utility performance, while the middle size appears the worst one. Referring to the geographical location, being in the Centre is detected as a favourable background condition while being in the South relates negatively to the performance. In line with a part of the recent literature, ownership does not display a significant difference among the management models. As a final evidence, being a multi-utility has a more favourable influence on water utility performance than mono-utility.
The reminder of the paper is organized as follows. In Section 2 the description of the data is presented after a short review of the Italian Water Sector. Data are analysed according to the methodology described in Section 3 and the obtained results are shown and discussed in Section 4. Concluding remarks are the content of Section 5.

The water utilities framework
Before going into the methodological and technical details of the Water Utility Performance Composite Indicator (WUP-CI), let us introduce the water utility industry and the Italian context, along with the data.

The water utility industry
The water industry in Italy has experienced significant changes during the last decades. In the Nineties a broad reform was launched, that fostered an industrialization process and introduced the so-called ''Ambito Territoriale Ottimale'' (ATO) namely Optimal Territorial Area for the governance of the industry as optimal geographic portions of the regions in accordance with the river basins [5,12,30]. The local supervision of the ATOs was entrusted to local water regulatory authorities. Notwithstanding the reform, the Italian water industry remained highly fragmented, with different management models for thousands of entrusted water operators, with very different features in terms of ownership, size and strategies: from direct management by municipalities or community-owned water supplies, to delegated management to publicly, mixed or privately owned utilities, from very small to big utilities listed on the stock exchange, from monoutilities to multiutilities, from operators that focused only on some water service (for example collection and distribution, sewerage or wastewater treatment) to operators that manage the integrated water services [29].
During the last ten years a broad debate involved the Italian citizens about the best way to manage the water industry; since in 2008 and 2009 the Italian government encouraged through the law the management by mixed or totally private water operators, a much participated referendum was realized in 2011, after which the direct and delegated management by publicly owned firm is still permitted and privatization is no more compulsory [11,29]. After the referendum, still many drawbacks exist and strong need of investments, to overcome the main weaknesses, among which the high leakages, inadequate wastewater treatment, service interruptions and non-drinkable water [29]. So, in 2011 a National Water Authority now called Autorità di Regolazione per Energia Reti e Ambiente (ARERA) was entrusted to supervise and regulate the water industry with the aim to define a homogeneous national framework in terms of tariff method and service contract type and to supervise the role of the local water authorities, directly responsible for supervising the operators that locally manage the water services [61]. Since 2012 ARERA has introduced new tariff methods to encourage investments and improvements in the quality and quantity of water services and to reduce differences among local contexts [59]. Moreover, it introduced standards for the quality of services in terms of contracts and technical requirements and tariffs incentives or penalties for operators that meet or not the targets settled [61]. These standards and targets concern, among others, the reduction of water losses to enforce the efficiency of water use and to reduce environmental impacts and the respect of adequate target time to provide relevant services such as new connection and repair breakdown.

Data
According with the above mentioned regulatory changes, water utilities have to operate in a new and more demanding institutional framework. As a direct consequence, their assessment should involve several dimensions. Economic and financial indicators have been long considered performance measures of the units under assessment and available in their balance sheets [30,57,60,67]. However, additional indicators are needed when it comes to the evaluation of a unit facing environmental challenges and operating in a regulated market as the water industry is. Beyond the economic and financial status, two more aspects deserve to be accounted for to get a more comprehensive picture of the water utility performance and to support decision making. On the one hand, the environmental aspect needs to be accounted for, given the worldwide commitment and the pressure towards a sustainable management of water resources. On the other hand, the social dimension also has to be considered, so that consumers would not be the only stakeholders bearing the burden of these new challenges resulted in higher bills or in poor service and delay.
In the current study we consider the Italian framework to show the relevance of such a multidimensional and integrated evaluation. The ''drought emergency'' and the penalties for European laws infringements make Italy an interesting study case to account respectively for environmental and social sustainability issues, in addition to the business indicators commonly used in company performance evaluation. The sample under analysis comprises 93 water utilities for the year 2013. 6 To preserve homogeneity, we focus on those operators that are not municipalities nor public bodies, but only independent companies. Moreover, to provide homogeneity of the data, we consider only those utilities providing all the five main water services (collection, adduction/transportation, distribution of water for civil use, sewerage and wastewater treatment) and reporting all the information for the relevant variables.
Due to the different dimensions involved in the assessment, we assembled an exceptional dataset collecting information from several sources. For the economic and financial items, we use AIDA-Bureau Van Dyck data (https://aida.bvdinfo.com/) and we rely on the most well-known and commonly used indicators. As economic profitability indexes, we observe the Earnings Before Interest, Tax, Depreciation and Amortization (EBITDA) margin as a proxy for the cash profit. We also consider the Return On Assets (ROA) and the Return On Equity (ROE) to evaluate how well the company converts respectively assets and investments into net income. Since the net income can be either positive in terms of profits or negative in terms of losses, these two indicators might display negative values that need to be dealt with in the model formulation. As financial indicators, we use both an autonomy index to look at the company's solvency with respect to its own assets (Financial autonomy) and with respect to debts from third parties (Autonomy from third parties). More in general and beyond the current application, whenever a company's performance is assessed in relation to the information available from balance sheets, economic or financial indicators may admit both positive and negative values whose way to be handled is not trivial.
For the environmental aspect, we include in our analysis the Percentage of water loss in the distribution pipes obtained from ''Il Portale dell'Acqua'' (Italian Water website) available online (http://www. acqua.gov.it/) and realized by the Italian government with data about Italian water operators and municipalities. More precisely, the water loss ratio is obtained as the difference between the water introduced in the network and the water supplied, divided by the water introduced in the network and then transformed in a percentage. The numerator of the ratio represents the leakage from all part of the network, the overflows at the utility's storage tanks and the water thefts. In our dataset we use the average water loss for each water utility obtained as a mean of the water loss in all the municipalities served. This variable has been considered in the water utility performance evaluation in comparatively recent times and included to account for the quality of the service (for a literature review, we refer the interested reader to [61]). However, only in the last years some authors have started distinguishing mostly between two dimensions of the service quality, keeping the environmental sustainability aspect separate from the social one (see [1,45,46]). On the one hand, service quality refers to the environmental impact of the water utility activity, observing for example the quality of water in terms of chemicals in the outgoing water [72], the water losses [26] or the unaccounted-for water [34,53]. On the other hand, service quality comprises the customers perspective on the delivered water services, expressed for example in terms of complaints, unplanned interruptions or time to rectify a sewer blockage. In the present paper, we consider the Target time to do a new connection (days) and the Target time to repair breakdowns (hours) as service quality indicators to measure the social sustainability of the water utility's activity. These indicators have been already used by Romano et al. [61] with reference to the Italian water utilities operating in two different regions, Tuscany and Veneto, to measure the period for which many customers experienced service interruption inconvenience. The information about these two variables has been retrieved from the service chart, ''Carta del Servizio Idrico Integrato'', a document drawn up by each utility to report its commitment in providing a certain service standard, approved by the local water authorities and publicly available (in most of the cases, on the water utility's website). For the regulatory period 2016-2019, the Italian Water Authority has conceived an incentive mechanism which links the tariff to the fullfilment of certain standard quality [61]. Referring to the two indicators we consider, the Authority has defined the following targets: 15 days for the time to do a new connection and 3 h for the time to repair breakdowns (https://www.arera.it/it/docs/17/917-17.htm). As the descriptive statistics of Table 1 show, the water utility targets are often way beyond the standard set by the Italian Authority. At this point, it is necessary to reflect on these indicators about environmental and social sustainability aspects and the way they are defined. Differently from the above-mentioned economic and financial indicators, a higher value displayed by these indicators denotes a worse performance. A higher percentage of water losses points at a poorer water utility G. D'Inverno et al. Note: Standard deviation in parentheses. Note: Standard deviation in parentheses. performance, harmful to the environment. The same applies for a water utility that reports a longer period of service interruption, detrimental for the consumers. Accordingly, the model formulation has to account for these differences, distinguishing between desirable indicators whose greater values denote a better performance, while the opposite holds for the undesirable indicators. Generally, this reasoning can apply to any indicator, regardless of the classification in the economic, financial, environmental and quality dimensions. Table 1 shows the descriptive statistics of the water utility performance indicators selected for the current analysis. From an overall look, the multidimensional aspect of the analysis emerges together with the need of accounting for the different preferences of the stakeholders involved in the water industry and in the water utility's activity: the water local authorities, the customers and ultimately the water utilities.
Since the background conditions where a water utility has to operate might influence its performance, we also consider a number of operating context variables, so to ensure a fair benchmarking exercise and to investigate the influence these variables have in the performance assessment. The data choice has been done in compliance with the data availability and the related literature. Specifically, we investigate the relationship between the water utility performance and some aspects that have been subject to reforms in many countries over the last 30 years but whose role is still under debate, namely the size, the ownership and the diversification [29]. In addition to these variables, also the geographical location of the water utilities is taken into account. We collect this information mostly from the websites of the water utilities. The size is calculated on the basis of the number of employees following the European Union parameters 7 : utilities with more than 250 employees are large, those with less than 250 and more than 50 employees are medium companies, and those with less than 50 employees are small utilities. As for the ownership, we distinguish the mixed-private companies from the public ones [49,57,60]. The degree of diversification distinguishes the mono-utilities from the multi-utilities. To conclude, we make a distinction for the water utilities located in the North, the Centre and the South of Italy [58,62]. Table 2 shows the descriptive statistics for the variable size and the distribution of the categorical variables (including the discrete version of size). 8 Furthermore, to get an intuition of the performance indicators distribution along the operating context variables, we break them down between the different groups. Exploring Table 3 it comes out that 7 https://ec.europa.eu/eurostat/web/structural-business-statistics/ structural-business-statistics/sme. 8 Due to the high variability of ''size'' in a continuous scale, in the following analysis we opt for the inclusion of this variable in its discrete version following the legislator definition. The results for the analysis with the continuous variable is available upon request from the authors. the biggest water utilities exhibit higher means for the profitability indicators and lower means for the financial ones. There is no substantial difference for the water loss indicator, while from the quality side, medium water utilities have higher means. Looking at Table 4 mixed-private companies basically show higher means for the economic indicators and there are not appreciable differences with respect to the other dimensions. Regarding the geographical location, Table 5 displays that water utilities in the South have lower means with respect to all the criteria. Moreover, Table 6 highlights that from an economic perspective four out of five indicators reveal greater means for Mono-Utilities than for Multi-Utilities. However, the situation is overturned with respect to water losses and one of the two quality indicators. The main advantage of using the water utility performance Composite Indicator advocated in this paper is to aggregate all these dimensions in the most favourable way for the units under analysis, weighting the most what they do the best and the least what they do the worst. In this way, none of them can complain of being unfairly assessed, since more importance is assigned to what they perform better and the operating context is taken into account as we are going to explain in the next section.

The model
In this section, we explain how to construct the Water Utility Performance Composite Indicator (WUP-CI), accounting for a number of issues raised in the analytical framework outlined so far.

A directional distance function composite indicator: Undesirable features, negative values and stakeholders' preferences
As we have already pointed out, the assessment of the water utilities' performance should take into account several dimensions, thus the necessity to create a Composite Indicator (CI) to encompass the different criteria under evaluation. In multi-criteria decision analysis (MCDA) different methods have been proposed for the construction of Composite Indicators and then used in different contexts (see for example [51,74]). A large part of the literature on CI is related to the ''weighted methods'' where a weight is assigned to each criterion and then a weighted sum (or even a product) is determined. In this framework, the weight choice is one of the most crucial aspect and it might be subjective and then questionable. This happens especially when several different stakeholders are involved in the evaluation process and it is difficult to determine, a priori, a hierarchy among the criteria (for an extensive discussion on the weighting of composite indicators, please refer to [28]). The Benefit of the Doubt (BoD) approach appears a suitable methodology to address this issue. According to the BoD, weights are chosen in the most favourable way for the water utility under evaluation, 9 so to give more importance to what it can do the best and low importance to what it does the worst. As the weight determination is data-driven, no evaluated unit can complain for a subjective and penalizing choice. In this perspective, weights are given by granting the water utility the benefit of the doubt [56]. In its first formulation proposed by Cherchye et al. [10], the optimal specificweights are determined so to make the overall CI as high as possible (an upper bound is fixed as 1). For each unit 0 , the value of CI is obtained by solving the following linear programming problem: 9 From now on we will refer to a water utility as the unit under evaluation.
where there are criteria and units under evaluation, 0 is the indicator related to the criterion for the unit 0 and 0 is the associated weight. The BoD model can be seen as a particular DEA model where several outputs and just a dummy input whose value is fixed at 1 are considered. As it is a DEA-like model, BoD allows to evaluate the performance of water utilities against a frontier consisting of best practice observations. From a practical point of view, the BoD model offers another advantage: the aggregate performance score is unaffected by a rescaling of any of the performance indicator and this allows for the inclusion of ratio data in the analysis. In its first formulation, the BoD model accounts only for criteria that correspond to ''desirable'' indicators; the higher is the value of the indicator, the better is the performance of the unit under evaluation. In the present analysis, the quality of the service and the environmental sustainability are measured by ''undesirable'' indicators, that is the higher is their value the worse is the behaviour of the water utilities. To encompass also this second kind of indicators, [73] propose a directional distance version of the BoD model where both the expansion of the desirable indicators and the contraction of undesirable ones are simultaneously considered (see also, [25,36,56]). This is done along the direction of a suitable vector = ( , ) where and give the direction for the desirable indicators and the undesirable ones respectively. In this new framework, for each unit, the composite indicator is computed by solving the following maximization problem: Referring to the unit , is the ℎ desirable indicator while is its ℎ undesirable one. The objective function value at the optimal solution corresponds to the maximal feasible expansion of the desirable indicators and contraction of the undesirable ones. [73] underline that the directional distance function is comparable to the Shepard's output distance function whenever the directional vector is fixed at the indicators' value of the evaluated unit, i.e. = (− , ) = (− 0 , 0 ). The choice of the directional distance function plays a key role in dealing with negative data; actually negative data cannot be straightforwardly included in a DEA model and several approaches have been proposed to overcome this drawback (see for example [2,48,55,68]). Among them [33] and [47] encompass negative data by defining the following directional vector = (− , ) = (−| 0 |, | 0 |). Their suggestion seems the most preferable one for the current analysis where several water utilities exhibit negative value in the economic profitability indicators.
According to the dual formulation of Problem (1), the weights are chosen in the most favourable way for each unit, without assigning any scale of importance to the single indicators. In some applications, the decision maker's preferences should be taken into account and then indicators cannot be put on equal footing (see for example [21,73]). In the current analysis, the water utilities performance is evaluated paying particular attention on the social and environmental sustainability of their actions. In this light, greater importance to quality and environmental indicators are assigned and this has been done by including assurance region type I (ARI) restriction weights. More precisely, the dual formulation of Problem (1) allows to add constraints regarding the relative importance of each indicator and then the water utility performance can be assessed by solving the following maximization problem [73]: wherēand̄are related to the indicators of the ideal average unit. The last two constraints require that the percentage contribution of any indicators must vary within a specific range, namely [ , ] for the desirable indicators = 1, … , and [ , ] for the undesirable ones = 1, … , . In this way, the weight constraints are the same for every evaluated unit. Once the maximum value for Problem (2) is determined, the performance measure is given by 0 = 1∕(1 + ). The best performing units have a score equal to one, while for the others the composite indicator scores vary from zero to one.

A robust and conditional approach of the directional distance function composite indicator: Outlying observations and operating context
In the estimation of Problem (2), there are two more aspects that need to be taken into account. First, we have to tackle the presence of outlying observations (if any) in the sample under analysis, to avoid downwardly biased estimates in the sample under analysis [25]. The literature has proposed a number of approaches to handle this issue. Among others, outlier detection procedures have been suggested to identify potential outlying observations, such as atypical observations and/or measurement errors, and to remove them from the sample. However, this could result in the removal of some observations that in their atypical magnitude might carry very useful and relevant information [17]. Neglecting this piece of information would work against the aim of performing a fair and comprehensive benchmarking exercise. Therefore, we opt for a different solution. Following the seminal papers by Cazals et al. [9] and adapted in the composite indicator framework as proposed by De Witte and Rogge [19], Rogge et al. [56] and Lavigne et al. [36] among others, we adopt a robust approach. The main intuition behind is to mitigate the influence of outlying observations by executing a Monte-Carlo simulation. Specifically, we draw a high number of times (e.g. = 2000) and with replacement < observations. For each of the draws, we compute a water utility performance composite indicator , for the -size sample. Then, we compute the robust composite indicator as the arithmetic average of the computed , . Due to the sub-sampling, the unit under analysis might not be part of the reference set. In this case, unlike the nonrobust CI score, the robust CI score can take values larger than one. Accordingly, the evaluated water utility is deemed super-performing and interpreted as doing better than the average randomly drawn utilities in the reference sample [20].
The second aspect concerns the context where the unit, in our case the water utility, has to operate in. It might be the case that working in a certain environment or institutional setting rather than another one might favour or hamper the achievement of a certain level of performance. For this reason, we should account for these background variables that are not under the direct control of the managers, but that still might exert an influence on the distribution of the performance scores and the attainable set. The literature presents several ways to investigate the role of these characteristics and different solutions have been adopted in the water utility framework as well (see for example [24,42,71]). In this context, we consider a conditional approach, following the path traced by Daraio and Simar [13,15] and adjusted to the composite indicator context as in [20,25,36] among others. The choice of this approach leads to two main advantages. First, it avoids imposing the so-called ''separability condition'' by including in onestage the contextual variables in the estimation of the score (see for all [13,14,16,56]). Differently from the unconditional case, where each unit is equally likely to be drawn in the sampling with replacement, in this case the units are drawn times with replacement but with a particular probability estimated by means of a kernel function on background variables. By performing the sampling in this way, the unit under analysis will be compared with units operating in an environment more similar to its own. At this point, the water utility is fairly assessed not only because of the favourable weighting system, but also because of the units it is compared with. The robust conditional water utility composite indicator , is obtained as the arithmetic average of the conditional composite indicator , , computed times. In this case, a , value larger than one suggests that the water utility under analysis is doing better than the average utilities in the reference sample facing a similar background. As a second benefit from performing a conditional analysis, we can explore the potential influence of the background characteristics besides the benchmarking exercise. A non-parametric statistical inference on the relationship between the estimated composite indicator scores and the background variables under scrutiny can be done. For an intuitive interpretation of the results (see for all [56]), we non-parametrically regress the ratio between the robust unconditional and conditional , scores with the operating context variables and the statistical significance can be obtained (see for all [16,37]). In this way, a positive slope in the partial regression plot denotes a favourable influence of the considered variable on the water utility performance level and the opposite holds for a negative slope. We refer the interested reader to all the above mentioned references for a more extensive and technical discussion.

Empirical application
In this section, we present the results on the comprehensive performance level assessment of 93 Italian water utilities in 2013. We obtain this evidence by using the model specification described in the previous section. Specifically, we estimate the water utility performance score along different dimensions, namely the economic and the financial ones as well as the environmental and service quality ones. Before exploring the main findings, two considerations need to be made and kept in mind while interpreting the results.
As mentioned in Section 3.1, imposing the weights restricts the role that each indicator can play in the overall performance assessment and the extent to which it can contribute to the water utility performance score. In the context of the water industry, the role of the regulator is to mediate the interests of all the involved stakeholders, including the companies, the customers and ultimately the environment itself. The measures adopted by the national authority in the last decades keep track of the order of priorities as well. For example, penalties have been envisaged for those water utilities that commit European laws infringement in order to safeguard the environment. Similarly, tariff penalties have been laid down for those companies that provide a customer disservice. Following this rationale, incentive measures have been also taken to reward the companies that comply with the law and achieve good service targets [61]. The present analysis reflects the priorities of the national regulator on the different dimensions under scrutiny to fairly account for this multidimensional framework. Specifically, referring to Problem (2) in Section 3, lower (i.e. ) and upper (i.e. ) bounds have been set for each indicator, so to give the following scale of importance among the analysed dimensions: 1. environmental, 2. service quality, 3. economic sustainability, 4.   financial sustainability. Service quality indicators share the same lower and upper bounds. ROA and Financial autonomy have the highest upper bound among the economic and the financial indicators since they are strictly linked to the invested capital. Being water utilities capital intensive firms, particular attention should be given to the investment side. On the contrary, dividend distribution is not very frequent in the water sector and Water Authority encourages firms in reinvesting their earnings to improve the quality of the provided services [58]. To balance the interests of all the involved stakeholders, the chosen upper bounds impose that the environmental and quality dimensions give a relatively higher contribution with respect to the financial and the economic one. If the evaluation had been performed reflecting the priorities of the companies, the weights assigned to each indicator would have most likely been different and the same reasoning applies if the customers priorities had been considered. 10 Moreover, the performance scores have been estimated by accounting for some operating context variables deemed important in the literature, namely the size, the ownership, the geographical location and the diversification. In Section 2.2 few patterns have already emerged by looking at the descriptive statics of the performance indicators grouped by 10 We have also estimated two different versions of the composite indicator following opposite extreme cases with respect to this baseline model. Moving away from the balancing goal of the regulator, one extreme scenario is to place very high priority on the environmental and quality service indicators, while the opposite extreme on the economic and financial indicators. In either cases, the change in the importance of each indicator weights does play a role on the Water Utility Performance Composite Indicator scores. We refer the reader to Appendix for an extensive discussion of the main findings.  different categories. In the following we show the potential of including these variables all together, so to get a broader picture and to capture synergies among these characteristics. Table 7 shows the results of the estimated robust conditional water utility performance composite indicator (WUP CI). For completeness and comparison purposes, we report also the non-robust and the robust unconditional version of the advocated composite indicator. 11 The robust conditional approach together with the non-robust and the robust unconditional one all offer a more integrated picture of the units under analysis. In fact, the non-robust case can be seen as a reference starting point. The robust approach gives the idea of the presence or not of atypical observations that might affect the findings of the non-robust analysis and it mitigates their influence. The conditional analysis allows to explore the role of external variables on the overall evaluation, both in the score estimation process and in the comparison between the robust unconditional scores and the conditional ones. Taken as a whole, we get a more exhaustive idea of the framework under investigation while accounting for different aspects. Going to the computed scores, the mean of the non-robust unconditional score is equal to 0.6143, suggesting a sizeable room for performance improvement by looking at the best practices detected in the sample, namely 38.57% obtained as 1-0.6143. Once the robust approach is considered to take into account the possible presence of atypical observations, the score slightly increases to 0.6946. Then, the potential room for improvement suggested when both the atypical observations and the operating context are accounted for amounts to 26.06%. This number should represent an indicative starting point to boost its own level of performance, while looking at the best practices and complying with the priorities of the national regulator. Before moving the discussion on the influence that the context has with respect to the detected level of performance, the descriptive statistics displays at least a couple of more very informative aspects. 11 The estimates have been obtained by choosing m=20 after a sensitivity analysis and B=2000. For more technical details we refer to Section 3.2.
First, the minimum values and the first quartile denote the presence of very poorly performing water utilities. Second, at the other extreme, the maximum values of the robust unconditional and conditional scores suggest the presence of at least one super-performing water utility. Fig. 1 provides a graphical comparison of the water utility performance composite indicator in its non-robust, robust unconditional and robust conditional version (for other applications, see also [19,56,70], among others). The comparison between the robust and non-robust CI scores in terms of ranking shows the observations (the water utilities) along the 45-degree line. This implies a negligible role of the robust approach in the sample under analysis. Moreover, all the robust scores are consistently higher than in the non-robust case. By comparing the conditional and the unconditional CI scores, we can appreciate the importance of taking into account the operating context to ensure a fairer benchmarking exercise. When looking at the distribution according to the rank, we see that many observations lie below the 45-degree line. This suggests that these water utilities benefit from the conditional assessment and specifically from accounting for an unfavourable context that might hamper the achievement of higher level of performance.
We explore the potential relationship among the water utility performance and some operating context variables usually considered in the literature. Specifically, we compute the ratio of the unconditional over the conditional water utility performance composite indicators to draw some considerations. We caution the reader that this is based on correlational evidence, so we refrain from giving a causal interpretation of the findings. We report the partial regression plots in Fig. 2 to explore whether background conditions have either a favourable or an unfavourable influence on the performance assessment [56,65,70]. 12 To focus the discussion of the results, we report only the evidence from the complete conditional model, accounting for all the four context variables. The inclusion of fewer ones delivers similar results available upon request from the authors.
Regarding the size, we can observe that large utilities have a positive and statistically significant relationship with the water utility performance, as suggested by the small and not overlapping confidence intervals. On the contrary, operating as a medium utility has the most unfavourable influence on the water utility performance. This evidence is in line with previous findings on the Italian water industry framework (see for example [29]), suggesting the idea that water utilities can perform better when big players or, to some extent, when they remain small local firms [63]. However, it is worth pointing out that the role of the size as an operating context variable is still controversial. Previous studies have not unequivocally identified the ''optimal dimension'' for a water utility and the findings strongly differ from one country to another [6,7,29]. Concerning the ownership, we do not find major differences in the way they correlate with the ratio between robust unconditional and conditional scores and the confidence intervals are quite broad. This confirms what have been found already in the literature (see for example [49,50] and the literature reviews by Guerrini et al. [30] and Berg and Marques [4]). One possible explanation can be rooted in the fact that this topic had been long debated in the last decades and many reforms have been enacted in this regard, giving rise to a more balanced coexistence of these different organizations. Moreover, privatization processes do not necessarily lead to improvements in efficiency and quality of service provision [3,35] as well as in tariff increases/decreases [27,59]. The geographical location presents a pattern consistently detected in other papers, that is, the South shows a significant unfavourable influence when assessing the performance of the water utilities compared to the Northern region and even more to the Centre. Most of this evidence can be interpreted in the light of two main aspects. First, the morphological characteristics and the climate cause long drought periods and hence water utilities face many difficulties in guaranteeing the water provision. Second, there are better infrastructure and more investments in the Centre and in the North compared to the southern regions. The southern mains are generally old and so, paradoxically, the South is both characterized by water scarcity and by the highest rate of water loss in Europe [29]. About the diversification, being a mono-utility shows a significant unfavourable influence on the performance of a water utility. Positive correlation with multi-utility might come from higher level of performance obtained by taking advantage of potential synergies arising from operating in different sectors, such as gas and electricity.

Conclusion
The strategic role of water provision and sanitation for sustainable development entails the necessity to regulate the water utilities' activity. The regulatory evolution, the new objectives and commitments of national and local Water Authorities, the new paradigm of the Circular Economy impose to evaluate the water sector from a multidimensional perspective. Therefore, beyond the traditional economic and financial criteria, the detection of best practices and virtuous behaviours should be carried out by including also the environmental and the quality issues of the service. In this light, we have proposed an innovative use of a non-parametric composite indicator, here labelled Water Utility Performance Composite Indicator (WUP-CI), to aggregate the following dimensions: the economic profitability, the financial solvency, the water losses and the customer satisfaction. The composite indicator has been obtained by running a directional distance BoD-model in its three versions: the non-robust, the robust and the robust conditional one. The model turns out to be suitable for dealing with crucial issues of the analysis, such as the presence of indicators with negative values, the presence of ''undesirable indicators'', the necessity of including in the analysis both the stakeholders' preferences and the background characteristics. The WUP-CI has then been used to evaluate the performance of 93 Italian water utilities that provide the integrated water service, including distribution, sewerage and wastewater activities. The presence of weight restrictions in the model has allowed to give more emphasis on the environmental and the quality dimension, rather than on the economic and the financial one, in accordance with the Service of General Economic Interest (SGEI) nature of the urban water service. Moreover, alternative scenarios for stakeholders' preferences have been also considered, showing that different priorities do play a role in the overall performance assessment. The empirical analysis showed that the Italian water industry needs policy and strategies to improve the performance of water utilities. Decision makers can get valuable information looking at the best practices that emerge from the analysis. Specifically, evidence can suggest where to intervene and to which extent to boost the performance of those water utilities with unsatisfactory results. Moreover, the analysis can contribute to the long-standing debate on what the relevant operating context variables are and how they influence the water utilities' performance. With reference to the size, our results show a positive relationship with this factor. On the contrary, operating in the South of Italy and being a mono-utility company can be considered as unfavourable conditions as compared to the background of the water utilities belonging to the other groups. Therefore, specific policies to support southern regions and to foster the performance of mono-utilities should be defined to overcome their gap. With respect to the ownership, no significant performance differences emerge between the totally publicly owned operators and the partially publicly owned ones. This finding is in line with the strand of the literature demonstrating that privatization processes should not be enforced, leaving the management model choice to decision makers on the basis of context specificities.   baseline model as presented in Section 4. The baseline composite indicator is constructed so to reflect the role of the regulator in mediating the interests of all the involved stakeholders (e.g., the companies, the customers and the environment), looking for a balance between the increasing need of the legislator on environmental and service quality issues and the market power of the companies themselves. Moving away from the balancing goal of the regulator, one extreme scenario is to place very high priority on the environmental and quality service indicators, while the opposite extreme on the economic and financial indicators.
In Fig. A.1 we present three scatter plots of the Water Utility Performance Composite Indicator (WUP CI) ranks for different scenarios, to show how the change in the importance of each indicator can affect the final ranking. Fig. A.1(a) provides a graphical comparison of the WUP CI ranks between the baseline model and the scenario where very high importance has been assigned to economic and financial indicators. Fig. A.1(b) compares the WUP CI ranks of the baseline model and the scenario where very high importance has been assigned to environmental and service quality indicators. In both cases, many observations lie below or above the 45-degree line. This points at the fact that the overall water utility performance evaluation might be subject to different outcomes depending on the interests pursued by the analysis. In the baseline model, the detected best practices should guide the regulator in identifying the most critical areas and in setting the system of penalties and tariffs accordingly from a policy perspective.
Reasonably, there will be water utilities that turn out to be penalized by this kind of assessment, either because they devote most of their attention in the economic and financial dimensions -one extreme -or because very committed to the environmental cause or to the customer satisfaction -the other extreme (we refer to [62], for a critical discussion of the Italian framework in this regard). In the former case the water utilities overlook their impact on the environment and offer a lower service quality to the customers. In the latter case, the water utilities neglect an economic and financial sustainability need required by their capital intensive nature, in favour of a too much stronger attention to the environment and the quality that eventually backfires on the customers by imposing higher tariffs. Taking this argument to the extreme, Fig. A.1(c) provides a graphical comparison of the WUP CI G. D'Inverno et al. Fig. A.3. Visualization of the partial regression plots with bias-corrected bootstrapped nonparametric confidence intervals for the operating context variables. Very high importance to environmental and service quality indicators. Source: Authors' own elaboration. ranks between these two extreme scenarios. In this case the impact in the ranking and in the detected best practices is sizeable. Similar patterns emerge when the non-robust and the robust unconditional version of the Water Utility Performance Composite Indicator is considered.
Following the same rationale, we question whether the role of the structural aspects considered in this analysis might change depending on these two extreme evaluation scenarios. Figs. A.2 and A.3 report the partial regression plots to explore the role of the contextual variables on the Water Utility Performance Composite Indicator (WUP-CI) estimated giving very high importance to the economic and financial aspects, or to the environmental and service quality ones. The main considerations that can be drawn concern mostly the geographical location and the size of the water utilities. In the scenario where very high importance is given to the economic and financial indicators, the Centre displays a negative relationship with the water utility performance compared to the North and the South. This evidence is slightly overturned in the baseline scenario and very pronounced once very high importance is attributed to the environmental and service quality indicators. A plausible explanation can be found in the strong commitment undertaken by the water companies located in the Centre of Italy to sustainable water and information campaigns to support it, confirming the evidence provided by previous studies on the Italian water industry framework (see for example [62]). As for the size, while in the scenario with greater importance on the economic and financial indicators and in the baseline model being a large firm (>250 employees) shows a significant favourable influence on the performance of the water utility and being a medium firm (50-250 employees) displays the most unfavourable influence, this is not the case anymore when giving very high importance to the environment and service quality. In fact, smaller water utilities could be more keen to accept customers and citizens' requests in terms of service provided and attention to avoid unnecessary environmental impact and waste of precious resources as drinkable water. For this reason, including environmental and service quality indicators in performance assessment helps to improve the evaluation of smaller utilities with respect to medium and big ones. The overlapping confidence intervals prevent us from inferring any statistically significant difference among size classes. Concerning the degree of diversification, similar patterns emerge compared to the baseline model, that is, being a multi-utility has a positive relationship with the utility performance. In both scenarios, there are no statistically significant differences among different types of ownership, confirming the evidence arising from the baseline scenario. This result confirms that publicly owned management models (direct or delegated) are able to obtain the same results in terms of performance of totally or partly privatized models when also environmental and service quality is included in the analysis. This result fosters previous findings [61] showing that publicly owned utilities performed only slightly worse than public-private partnerships when quality issues are included in the efficiency assessment. Thus, privatization should be evaluated with caution when only economic and financial indicators are included in the analysis since other relevant aspects of the mission of public utilities (in terms of environmental sustainability and service quality to citizens) are omitted [53].
Overall, we can conclude that giving either more importance to the economic and financial or to the environmental and the quality aspects does matter and play a role.