Social Uncertainty Evaluation of Social Impact Bonds: A Model and Practical Application

In the last years, Social Impact Bonds (SIBs) have gained popularity in the impact investing space. A number of scholars and practitioners are debating—in theory and practice—the opportunities, challenges and obstacles of these financial models. Amongst others, social uncertainty evaluation metrics appear as a critical factor for the future development of the SIB market. The present work aims to shed some light on this issue, by realizing a practical application of a model—which is an extension of a framework previously proposed—for social uncertainty evaluation in SIBs. In our exploratory analysis, 34 SIBs were selected for the empirical tests. We combined the Analytic Hierarchical Process (AHP) with the creation of aggregate measure, deriving by suitable indicators at the end of the tree. Our findings open new avenues for future research in the field of uncertainty factors in the SIB landscape. Finally, our results represent a basis for implementing a prediction model for social uncertainty evaluation.


Social Impact Bonds and Social Uncertainty: Setting the Context
The purpose of this exploratory study is to realize a practical application of a model for social uncertainty evaluation in Social Impact Bonds (SIBs).
Since their introduction in the United Kingdom in 2010, SIBs have become popular in the academic, practitioner, and policy-maker world [1,2]. In the impact investing arena, SIBs are an interesting innovative financial mechanism [3], that seem to promise a solution for financing complex social interventions [4] and reallocate performance and financial risk from the public towards the private sector ( [5], p. 41). More in detail, SIBs have been defined as "payment by results contracts that leverage private social investment to cover the up-front expenditure associated with welfare services" ( [6], p. 57). According to ([7], p. 1), payment-by-results (PbR) is a scheme for "delivering public services where the government (or the commissioner) pays providers for the outcome they achieve rather than the activities they deliver".
A literature overview provided by ( [8], p. 72) discusses the main characteristics of PbR: (1) contingent payments depend on independent verification of results; (2) the PbR contracts should include both reward and penalties useful for achieving the outcomes; (3) risk transfer exists and depends on both the impact and the success of the project. SIBs are a form of PbR, but SIBs extend PbR by harnessing social investment from capital markets to cover (partly or fully) the costs of service intervention [6].
Therefore, investor return depends on the impact of the project and on the achieving outcomes [9]. The key point in the SIBs (and PbR) schemes is the outcome measurement, and, accordingly, the quality of the performance measure (see Section 2.1).
On the theoretical level, several scholars note that the opportunities related to the SIB market development are vast and perhaps endless [5], but, at the same time, challenges and obstacles are often highlighted [10]. The complexity of the SIBs schemes is evident (see Section 2.1): they are characterized by public-private partnerships [1] between several counterparties, and each actor has different interests, goals, and expectations, and different risk attitudes and perceptions [11].
On the empirical level, examining the evidence from the SIB market, it is worth noticing that the diffusion of SIBs is still modest. From the launch, in 2010, of the first pilot SIB in the UK (HMP Peterborough) to the end of October 2019, only just over 130 SIBs have been implemented worldwide [12]. Among these, only 34 are closed. It is clear that the limited size of the SIB market clashes with the theoretically great potential of SIBs.
The risk (or uncertainty) issue represents one of the main controversial points related to the use of SIBs on a larger scale [5,13,14] and, at the same time, it is still underexplored [15][16][17]. It is clear that the complexity of the SIBs model increases the probability of SIBs failing. On the other hand, some scholars are underlying downsides and risk facets of SIB models (see Section 2.1).
More generally, with regards to the impact investing field-of which SIBs are key components-some scholars are exploring, amongst other things, social risk [18][19][20][21][22], impact risk [23][24][25][26], social uncertainty [27,28], and their linkages and differences [27], by also using methods and approaches very different from those of traditional mainstream finance. Impact investing is considered a "revolutionary" way to improve sustainability. Impact investing is characterized by the explicit intention (intentionality), social purpose, and the ability to generate positive environmental and/or social impact in accordance with an appropriate risk-return investment [11]. It is important to note that "intentionality" represents the linkage between ethics and facts, thus determining a consequentiality between (positive) values and value creation. These innovative approaches open new views for research on how finance should be "reconsidered" [29][30][31], and also question the foundations of the mainstream finance [32].
In order to shed light on these open problems, some works are focusing on SIBs' practices to better understand risks (or uncertainty) related to SIBs' contractual schemes and social and financial features [1,5,16]. In the light of the social impact investing approaches, a constant focus on social risk (rather than on other risks) certainly favors a more effective social finance capital allocation and reminds us that the first purpose of social finance is to bring about maximum social change rather than to make money ( [33], p. 308).
To the best of our knowledge, only very few works, such as Scognamiglio et al. [5,22], have proposed a framework able to support the evaluation of an SIB, starting from a measurement of the social uncertainty (for a discussion of these previous works see Section 3.1, in which we also explain several main alternative approaches to the impact evaluation). These works open new promising avenues for researches, useful to promote the development of SIB models, on large scale.
However, currently, regarding the SIBs, there are still no definite and effective models (already widely tested and used) capable of evaluating the social uncertainty. Starting from these works, we attempt to test an implemented model-which is an extension of a framework proposed by [5]-capable of providing final scores related to social uncertainty evaluation of the overall population of SIBs closed worldwide at the end of October 2019. As far as we are aware, this is the first analysis using all of the closed extant SIBs. Our work offers some preliminary but significant results on social uncertainty, which is one of the most relevant and underexplored issues of SIBs. In addition, it proposes a quantitative measure of several qualitative key elements related to the SIB Programs, providing useful suggestions for investors and other stakeholders. Then, it realizes an in-depth comparison of several SIB programs and an insight of market practices. Finally, this pilot study is a retrospective analysis and represents a first step towards implementing a prediction model for social uncertainty evaluation.
In our work, we use the following definition of social uncertainty provided by ( [5], p. 42): "most definitions interpret social uncertainty as the risk of not reaching the intended impact [18] or as the likelihood that a given allocation of capital will generate the expected social outcomes irrespective of any financial returns or losses [27]. Despite this heterogeneity of definitions, social uncertainty could be considered more generally as a concept providing an indication of the certainty that an output will lead to the stated impact [26]." The choice to focus on uncertainty (rather than risk) deserves some clarification. Knight [34] established a clear difference between the concepts of "risk"and "uncertainty". The famous phrase "uncertainty is an unknown risk, whilst risk is a measurable uncertainty" underlines the fact that risk refers to situations that can be quantitatively measured through a distinct probability distribution; on the other hand, uncertainty refers to those events for which there is not enough knowledge to identify objective probabilities. Therefore, the revolutionary works of Knight shed light on different facets of concepts of uncertainty and show the limitedness of the mechanical analogy in understanding enterprise and profit ( [35], p. 458). In this vein, Knight's view is a challenge to neoclassical theory [35]: uncertainty arises out of partial knowledge ( [35], p. 459) and this concept represents the turning point for advancing in psychological decision theory, such as a host of psychological insights beyond the risk-uncertainty distinction ( [36], p. 458). In more detail, Knight described features of risky choices that were to become the key components of prospect theory (Kahneman and Tversky [37]): the reference-dependent valuation of outcomes and the non-linear weighting of probabilities.
Based on these considerations, Knight's view is a useful starting point for our exploratory analysis: risk is objective since it represents a variable independently of the individual; uncertainty is, on the contrary, subjective, because it depends on the individual, their information, and their temperament. Hence, a decision under the conditions of uncertainty has a higher impact than one made under the conditions of risk, both for the decision maker and for the context in which they operate. The tools used to analyze risk and uncertainty are different, too. For risk statistical techniques are used, whilst for uncertainty heuristic ones are used [38].
From this point of view, our application appears particularly interesting, because its focus is on social uncertainty (rather than on social risk), following Knight's view: it overcomes the limitations of the traditional finance paradigm (and of its mechanistic models) by leading towards more complete foundations for building momentum for systemic change. In addition, our analysis uses a modified Analytic Hierarchic Process (AHP) that consists of combining any leaf with the hierarchy tree; typical of such a multi-criteria method, indicators with appropriate measuring scales are capable of describing the characteristics of the problem. Compared to the previous methods [5,39,40], our implemented technique offers the advantage of obtaining more flexible adjustments and implementations in terms of levels of investigation, indicators, and scales. These characteristics seem to be particularly useful for better understanding SIB model dynamics and peculiarities (see Sections 2.1 and 3.1).

The SIB Model: Key Elements
Built around a collaborative public-private contract, SIBs represent a new funding model that aims to improve service quality and to enhance the social outcomes achieved by using private resources rather than public funding. Proponents of SIBs often suggest that these new schemes have the capacity to leverage additional resources for innovative services that will help, in the near future, to improve social outcomes and cost savings for the public commissioner [6]. The introduction of private principles and actors through outcome-based commissioning has received a great deal of attention in recent years [6]. The SIB model was first proposed in Peterborough (United Kingdom) in 2010 and has quickly spread internationally and across different sectors [13].
SIBs involve payment-by-results (PbR) through a contract between public service providers and private investors [41], in which investors provide the upfront financing to service providers for the interventions that target a social outcome. At the same time, the commissioner (also named the outcome payer) makes PbR payments based on the level of social outcomes achieved. The interplay between all the involved actors is represented in Figure 1. The implementation of SIB projects begins when the commissioner (usually a public entity, at the national or local level) identifies a certain social need and target population and enters a contract with the intermediary [42]. After receiving the working capital needed from the intermediary, the service provider delivers a pre-agreed-upon set of outcomes [17]. The outcome measurement represents a key element for SIBs: the independent evaluator assesses their final level and only in the case of achievement of the pre-defined social outcomes does the commissioner provide payments.
In summary, the SIB structure transfers the risk of poor performance from the commissioner to the investors; investors receive a financial return for taking this risk, as well as a social return through the outcomes achieved (a 'blended return').
The combination of the different actors involved in an SIB project is part of the innovation [43], and it enables the public sector to commission innovative services by sharing the risk of exploring a new welfare approach and investors to provide working capital for social projects by receiving both a financial and a social return [42].
SIBs are not suited to every project, but rather depend on specific criteria [42,44], such as the presence of a clear and measurable outcome.
Despite the growing interest shown by both academics and practitioners, the numbers of SIBs activated worldwide is exhibiting a slowdown (see [12] ).
From a theoretical point of view, recent works seems to focus on SIBs' downsides by criticizing the marketization of the delivery of traditionally public services, the effective ability to lead to better outcomes, and the value they provide compared with their cost structure and the additional transaction and administrative costs they generate [10,16,[45][46][47][48].
The debate appears polarized around a series of recurrent aspects [49], including those that describe SIBs as a model framed to transfer risks usually retained by the public sector to the private sector. Under this perspective, only a limited number of works tried to understand the risks related to similar initiatives [5,42,[50][51][52].

The SIB Market: Characteristics and Trends
According to the Social Finance Database [12], 137 Impact Bonds (IBs) have been launched around the world (with a total amount of raised capital of about $440 M). More in detail, 130 of these are SIBs, whereas 7 can be defined as Development Impact Bonds. Table 1 shows the number of SIBs implemented by country. The SIBs launched covered several social issue areas (e.g., workforce development, housing/ homelessness housing, health, child and family welfare, criminal justice, education and early years, and poverty and environment).
According to the Governemnt Outcomes Lab [53] (GO Lab), 34 SIBs have been completed in nine different countries. Among these, the "HMP Peterborough (The One Service)" and the "NYC Adolescent Behavioral Learning Experience Project for Incarcerated Youth (NYC ABLE)" were closed before completion. Table A1 (see Appendix A.1) provides an overview in terms of the "socio-demographic" data of the closed SIBs, with a focus on the investors involved in each SIB project. Table 2 shows the capital raised and the number of SIBs launched/completed for social issue worldwide.

Methodological Approach
In this work we attempt to measure the uncertainty of SIBs by combining a multi-criteria decision method with a process for constructing composite indicators. We test a model that is an extension and an improvement of [5]. In [5], social uncertainty evaluation in SIBs was hierarchically explained involving 3 main categories, 5 factors and 16 sub-factors. Scognamiglio et al. [5], in turn, referred to Serrano-Cinca and Gutierrez-Nieto [40], where the goodness of investment in Social Venture Capital (SVC) was evaluated through a model hierarchically structured in 3 principal factors, 26 criteria, and 160 indicators, as suggested by a small group of SVC's analysts and academics. Furthermore, the same structure of model in [5] was used in Scognamiglio et al. [22] to explore the relationship between social risk and financial return within the context of SIBs.
Starting from these previous works, our model uses the multi-criteria decision method, the aggregation techniques, and the measuring scales in a different manner.
Among the different multi-criteria decision methods, AHP is suitable for its versatility in different fields (see Saaty [39]) and for solving complex problems after disassembling the phenomenon into criteria (e.g., categories, factors, and sub-factors) and assigning a priority to each of them.
In order to the design the model, an SIBs uncertainty score is given using an aggregation process where the elements, similarly to [5] in type, number, and position, are placed over a three-level tree diagram. At the first level there are three main categories; at the second level, five factors are divided into two groups, and two indicators are linked to an upper category; and at the third level, 14 indicators are grouped around the upper factors. Figure 2 shows the diagram tree used in detail.
In our study, despite [40], AHP is applied only for the first two levels of the diagram tree (categories and factors) and, differently from [5], the same is separately repeated for groups of the same level. The computation of weights for the different levels allows us to obtain a final scoring of immediate interpretation without using the comparisons typical of AHP. Furthermore, the computation of weight for different groups reduces the error in terms of consistent judgments (see Appendix B.1).
The explorative nature of our analysis and the absence of a large dataset required the use of alternative statistical techniques (with respect to the classical ones such as factorial analysis, cluster analysis, etc.). In addition, it is worth emphasizing that these techniques need strong assumptions not available in our context (see, e.g., normality conditions).
Our analysis also realized pilot interviews for scholars with extensive experience in the SIBs field (via questionnaire) in order to compare all pairwise elements belonging to the same hierarchical level and concerning the common upper aspect. Table 3 presents the experts' pairwise comparisons among the criteria and the weights of each criteria, in accordance with Figure 2.
In particular, the comparisons were made by the experts for groups of the same level, using the 9-point Saaty scale (in which 1 = "two criteria are equally important"and 9 = "the first criterion is absolutely more important than the second", 3, 5, 7 denote respectively slight, medium, and strong importance, while 1 /3, 1 /5, 1 /7, and 1 /9 assume the reciprocal sense). The results are the entries of the so-called Pair Comparison Matrices (PCMs) (see the matrices in columns 3, 5, and 7 of Table 3). Each PCM then produces the priorities of the criteria involved (see the percentages in columns 4, 6, and 8 of Table 3). Thus, for example, if we consider how the expert #R1 evaluates the categories, then her PCM can be read: "Program Process"and "Contractual Condition" contribute equally to uncertainty (value = 1); while both "Program Process"and "Contractual Condition" are strongly more uncertain than "Evaluation" (value = 5); and by contrast, "Evaluation" is strongly more certain (value = 1/5) both than "Program Process"and "Contractual Condition". By this PCM, the priorities of "Program Process", "Contractual Condition" and "Evaluation" are 45.5%, 45.5%, and 9.1%, respectively.   The composite indicators have been constructed by using the min-max normalization procedure with an objective weighting method and a weighted mean as the aggregation function.
In particular, the min-max normalization method (see [54], Table 3, p. 30) is a linear transformation that scales the data between 0 and 1 after subtracting the minimum value and successively scaling the data to the range of the collected data. This type of normalization allows us to free all indicators from their unit of measurement and to mark those unnecessary, which will hence be deleted from the next computation.
The weighting procedure restitutes an objective value proportional to the length of the measuring scale used and near to the ideal one. In fact, despite [5], a wider 5-point Likert scale (1 = randomized control Trial, 2 = quasi-experimental, 3 = validate administrative data, 4 = historical comparison, and 5 = unavailable) is used to measure the indicator "Evaluation methodology". In this case, the additional and better specified attributes (one more than [5] and with a well-balanced scale as suggested by Go LAB) allow us partially to offset the structural gap among categories. In particular, with regard to the distribution of indicators (12.5% in "Evaluation", 31.3% in "Contractual conditions" and 56.3% in "Program process"), "Evaluation methodology" has 66.7% priority in its category and 6.67% more than its corresponding priority in [5]. This positive variation multiplied by the difference of the scores in "Evaluation" contributes to improving our model. Moreover, involving the indicators with the longest scales ("Evaluation methodology" and "Number of Outcomes"), this variation is fundamental for the "Evaluation" category and, more in general, provides a better basis for carrying out the uncertainty.
In conclusion, the SIBs uncertainty is a number ranging from 0 to 1 (with 0 = absolute certainty and 1 = absolute uncertainty).

The Model
We start from the diagram tree shown in Figure 2 to explain the model. We use the following graphic symbols to specify the position of each element involved and its relationships with others: k and, after proving the sufficient consistency for each of them, estimate the weights w ( ) ink using the principal eigenvector method. The ranking thus obtained will be used only when the procedure arrives at the level where these elements are. Phase 3 or Total aggregation. Aggregate the scores of the indicators which refer to the same aspect to obtain the composite indicators: Continue the aggregative process, going up to the next level, while taking into account the weights previously computed by the related PCM. When the first level (goal) is achieved, the process is ended and the final score thus obtained for aggregation gives us the measure of uncertainty: For more clarification on the mathematical tools and model, see Appendix B and C.

Sample and Data Collection
For this pilot analysis, a practical application was tested by using a "population" of SIBs (see Appendix A). The sample was not randomly selected. In more detail, SIBs must be closed (as of the end of October 2019), and they have to comply with the requirements of the transparency and verifiability of information.
To evaluate the model (specified in the previous sections) and its usefulness, we tested it by using data collected from 34 SIBs. More in detail, we tested the model using two types of data: the objective publicly available data (through the Government Outcomes Lab-GO Lab-database and other sources) relatively to scales and units of indicators, and the subjective data through a preliminary survey (via questionnaire) to obtain the priorities among factors and categories of our factor tree.
Two junior researchers separately selected publicly available material on online databases and publicly available material from other sources. Subsequently, senior researchers checked the information drawn up and created a protocol to use relevant information for analysis, ensuring research quality and rigor.
With regard to the survey, we selected the experts using the following criteria: Criterion 1: the experts on SIBs are selected within those who authored or co-authored peer-reviewed chapters of books and books on the topic area; Criterion 2: the experts have published at least one research article evaluating and/or measuring aspects of SIBs in prestigious international journals.
In August and September 2019, we found through Google Scholar (using search terms in line with Criteria 1 and 2) a total of 26 potential works in line with Criterion 1 and only 10 in line with Criterion 2. The experts were selected if they met at minimum one of the criteria. Finally, 3 experts were selected (of which 11.5 % are in line with the first criterion and 33.3% with all criteria). From a statistical point of view, the size of this sample guarantees a standard error of ±7.2% with a 99.5% confidence level on the reliability of the prioritizing method.
Furthermore, the experts were also consulted to refine the structure of the diagram tree.

The Sample: A Focus on Key Characteristics
We have analyzed for each of the 34 completed SIBs the following key characteristics: • Duration: length of providing services (expressed in years); • Financial resources: -Max outcome payments: total outcome payments (expressed in home country currency); -Capital raised: capital invested by the investors (expressed in home country currency); • Service provider: a private for-profit or not-for-profit organization to deliver social services; • Number of outcomes: number of outcomes defined by the project; • Target population: people to which an outcome is targeted; • Intermediaries: subjects involved to coordinate investor(s), service provider(s) and evaluator(s).
We assigned a number (from 1 to 34) for each of the 34 SIBs analyzed in our work in order to simplify the elaboration and presentation of our study (see Appendix A).
These 34 completed SIBs at the end of October 2019, according to the GO Lab Database, represent about 25% of the total number of SIBs launched worldwide.
With regard to our "population", we have analyzed in depth: • For detailed information about these key characteristics of the SIBs analyzed in our model see Table A2 in Appendix A.
Despite this, our "population" shows some limitations regarding "capital raised" and "issue area".
Analyzing the "capital raised", our "sample" has reached about $50 million compared to $440 million. These 34 completed SIBs show, on average, a low amount of "capital raised".
In accordance with Social Finance UK, the major investments have been made in United States since 2013, seeing that 26 projects were launched that correspond to $219 million of "capital raised". The projects that were launched covered vast territories and/or a high number of people. These big projects are still being implemented and should be finished within the next few years. Only one of these big projects (NYC Adolescent Behavioral Learning Experience Project for Incarcerated Youth) has been concluded and was included in our "population".
Comparing the characteristics of the "issue area" between our sample and all of the SIBs launched around the world, we noted that our "population" lacked two new important issue areas, in accordance with the categories defined by Social Finance: "health" and "poverty and environment".
SIBs for the health sector are called Health Impact Bonds (HIBs) and they represent a new method of financing the health sector, as observed by Rizzello et al. [8], which has always been characterized by financial deficit. These HIBs have developed in many parts of the world since 2015, not only in developing countries but also in industrialized countries. Currently, there are 22 HIBs launched in many countries all over the world. These projects are still being implemented according to the GO Lab database, and will be concluded within a few years.
There were three SIBs addressed to a "Poverty and environment" issue launched between 2016-2019. These projects were launched in the USA, The Netherlands, and in Uganda and will be concluded in the next few years. Currently, these types of SIBs are in the implementation phase, according to the GO Lab database.
In our sample, most of the SIBs analyzed were concerned with "Workforce Development" and "Homelessness" social issues (see Table 2).
For an overview in terms of the "socio-demographic" data of the sample with a focus on the investors involved in each SIB project, see Table A1 in Appendix A.1.

Practical Application
The model in Section 3.2 was tested on completed 34 SIBs. Throughout this work, we use the following notation: In Table 4 the numerical values of m ( ) , Λ ( ) , k, and G ( ) are presented according to Figure 2.
The uncertainty score of each SIB has been calculated from the data collected (see Table 5, variables A, B, ..., R) following the phases described in Section 3.2.
The data were normalized with the formula (A6) for obtaining basic homogenous indicators labeled by the italic letters running from A to R (see Appendix D, Table A5). In our study, the variable "P" is constant and assumes a minimum value (=1), so its normalization P loses sense (see Appendix D, Table A5, label N/A) and allows us to mark it as a variable not useful to the computation. As said in Section 3.1, this fact is explained thus: "Social Variation" is a consolidated practice for all SIBs and is also less significant than others to measuring the SIB uncertainty.
We then horizontally aggregate the normalized data (see Appendix C, Figure A2 and formula (A12)) for groups and thus obtain the composite indicators. Thus, for example, if the variables A, B, and C are selected and afferent to the same group (see column 4 and row 4 of Table 4), their aggregation determines the indicator, or more precisely, the factor A + B + C = "Program Features". For example, in the case of SIB number 29, the values will be: A + B + C = 0.50 × 0.50 + 0.00 × 0.25 + 1.00 × 0.25 = 0.500.
In our diagram tree, the majority of indicators is placed at the third level (see row 4 of Table 4 in correspondence to n). Therefore, the resulting composite indicators are the factors placed at the second level (see Appendix D, Table A6).
After deleting P, the resulting composite indicator "Local Context Variation", given by the aggregation of only N and O, is lighter. In the case of the variables Q and R, they are on the second level (see also n = 2 in Table 4), and hence the resulting composite indicator is the category "Evaluation".

Model Validation
We validated the model starting from the three final scores provided by our experts (see Table 6). The question is whether these differences are such as to affect the model and the construction of the final score (see formula A19). In this respect, we conducted a descriptive analysis and a robustness analysis.

Descriptive Statistics
The final score data of #R1 and #R2 have a moderate range (0.3343 and 0.2616, respectively) excluding the outliers (SIB 1, 29, 32 for #R1 and SIB 7, 14, 23 for #R2); they are close to a mean (0.4269 and 0.3900, respectively) since they have a low standard deviation (0.0765 and 0.0685, respectively). By contrast, the data of #R3 are very variegated (0.1045 of std. dev.), but tend on average to a lower value of uncertainty (mean of 0.3268); they are also equally distributed into the interquartile range (IQR) and along the whiskers, and none of them are outliers.
Finally, we observe that the IQRs of #R1 and #R2 contain SIBs with moderate uncertainty, while the IQR of #R3, ending at a lower value than others (percentile 75 = 0.3631), contains SIBs that are quite certain.
The main statistics are presented in Table 7.   We use the Spearman correlation (ρ) and the absolute mean difference (|∆r|) to measure the pairwise diversity among three dataset.

Robustness Check
We examined the robustness of the model through the extent to which results are affected by the indicators used. In the present case, we compared the final score with the one obtained, every time omitting one indicator in turn.
In Table 8, we present the results of such a sensitivity analysis conducted for any expert by using the Spearman coefficient. The final scoring is substantially stable with the exception of one indicator for #R2 (L) and #R3 (F), and of two indicators for #R1 (L and M). For every expert, the largest contribution to the final scoring is principally given by the indicators into "Local Context variation" and "Evaluation" (correlation close to 1.000). In minor measure, the uncertainty depends on the indicators into "Program Features" and in "Intermediary" (correlations around 0.990 and 0.930, respectively). Finally, the rest of the indicators participate moderately in the score, according to the experts considered.

Results from Model Application: Some Non-Relevant Indicators
Our analysis highlights the presence of indicators without relevance. As previously mentioned in Section 4.1, indicator "P" has the same value for all of the 34 SIBs and refers to the question "is the program developed in the same social and local context?". This indicator is probably not relevant to our analysis because usually SIBs are developed for a specific project, to be implemented in a specific area in which the social need has been previously detected and with a limited possibility of social variation. Thus, to avoid differences among the different cohorts in which the program will be implemented, these should have the same "social or local"characteristics.
An indicator that is strictly connected to our indicator "P", even though it is enclosed in the factor "Local Context Variation", is indicator "M"(Variation of typology). As summarized in Table 5, indicator "M"shows the same value (=2) for all of the SIBs enclosed in our sample except for SIBs 12, 13, 14, and 27 (=1). Indicator "M"refers to the question "does the contract permit the ongoing variation of target typology?". Like indicator "P", indicator "M"is constant (except in four cases). Overall, this means that 88% of our cases do not show any kind of ongoing variation of target typology during the implementation stage and, like in the case of indicator "P", the motivation relies to the fact that these two aspects are often tightly defined in the contract.
In order to better understand this aspect, we linked our result to the case of Peterborough SIB. We know that this SIB has been implemented by the UK Ministry of Justice to provide 'through-the-gate' and post-release support to prisoners serving short sentences to reduce reoffending in a group of 3000 adult males (aged 18 or over) who received custodial sentences of less than 12 months [44,49]. As clarified by FitzGerald et al. [49], the cohort was very tightly defined and composed only of individuals with prison sentences of less than 12 months and in which everyone eligible has been contacted by the delivery organization[s] even though participation was voluntary. Eligibility criteria for prisoners to enter the cohort were defined in the contract.
Overall, this implies that the possibility of target variation should have been included in the contract. This aspect is the same for 88% of our sample. The case of Peterbough is also useful to explain results for indicator "O"(Public policy variation). Looking inside Table 5, indicator "O"shows a different value only for Peterbough SIB (value = 1) that could be considered as a unique case, considering that it was the first to be implemented and the only one in which a policy reform caused the cancellation of the third cohort [42,58].
Other indicators that show a limited information power are the pilot phase (B), empirical evidence (C), and years of experience of the service provider (G). Regarding indicator "B", only in the "London Homelessness Social Impact Bond (Thames Reach)"SIB the value is different (value = 2) in respect to the others (value = 1), while regarding indicator "C"only in the SIBs "DWP Innovation Fund Round II-Thames Valley (Energise)", "DWP Innovation Fund Round II-West London (Prevista)", and "the Benevolent Society Social Benefit Bond"the value is different (value = 1) in respect to the others (value = 2).
Indicator "B"corresponds to the question "is a pilot phase present?"and it is differentiated from the others only by the fact that this information is not disclosed or not available for this SIB. Finally, both indicators "B"and "C"are enclosed in the factor "Program features". This implies that the factor "Program feautures" is almost entirely explained by the indicator "A" (program duration).
Regarding indicator "G"that corresponds to the question "how many years of experience do the older service providers involved in the program have?", it assumes a different value (value = 2) in respect to the others (value = 1) only in the SIBs "London Homelessness Social Impact Bond (St Mungo's/Street Impact)"and "Fair Chance Fund-West Yorkshire (Fusion Housing)".

Differences between Experts' Preferences and Weights and Uncertainty Scores
Responses provided by our three respondents (#R1, #R2, #R3) highlight differences both in the categories and factors suggested (see Table 3).
As synthesized in Table 3, regarding our categories, #R1 and #R2 showed the same level of agreement around the three categories, while #R3 showed a high preference for the first category ("Program process") and a lower preference (11.1%) for the second and third. This means that for #R3 the first category was more important than the other two. Regarding our factors, the preferences of #R2 were equally distributed between "Program Feautures", "Service Provider", and "Intermediary"(0.50), and between "Target variation"and "Local context variation". By contrast, #R3 provided a different perspective by reducing the importance of the factor "Program features"and of "Local context variation". Differences in respondents' weights and preferences correspond to a high variance among the overall scores for each of the SIBs. Table 9 summarizes SIBs with the highest level of uncertainty by the respondents.
As emphasized in Table 9, #R1 and #R2 show a high level of uncertainty for the following SIBs: "Buzinezzclub Programme (Rotterdam)", "Academia de Còdigo Jùnior Lisbon"and "Perspective: Work". #R2 and #R3 show a high level of uncertainty for the following SIBs: "Youth Engagement Fund-Unlocking Potential/Career Connect (Greater Merseyside)"and "DWP Innovation Fund Round I-East London (Links for Life)". No SIBs are in common between #R1 and #R3.
It is interesting to note that #R1 provided the second highest level of uncertainty associated with the HMP Peterborough SIB that, as highlighted previously, represents the first SIB implemented and the only one discontinued due to a policy reform. This appears to be in line with the preferences, in terms of weights, provided by #R1.
Comparing the lowest scores provided by our respondents and summarized in Table 10, we found that #R1, #R2, and #R3 converge towards the "DWP Innovation Fund Round II-West London (Prevista)"SIB that is also the SIB with the lowest score of uncertainty (and also the absolute lower score). At the same time, #R1 and #R2 show a similar score for the following SIBs: "NYC Adolescent Behavioral Learning Experience Project for Incarcerated Youth" and "DWP Innovation Fund Round II-West London (Prevista)" while #R2 and #R3 show a similar score for the following SIBs: "Fair Chance Fund-Birmingham (Rewriting Futures/St Basil's)", "Fair Chance Fund-Liverpool (Local Solutions)", and "Youth Engagement Fund-Prevista (London)".  By comparing Table 9 and Table 10, it emerges that if #R1 and #R2 consider the SIB "Buzinezzclub Program (Rotterdam)"as the most uncertain, #R3 considers this SIB as less uncertain (27%), and on the contrary while #R1 consider the SIB "London Homelessness Social Impact Bond (Thames Reach)"as one of the less uncertain, the same SIB is considered by #R3 as the most uncertain. The different weights and importances justify these differences (Table 3). Table 10 highlights a further aspect. In the lists of SIBs with the lower level of uncertainty, #R1 and #R2 signaled the project "NYC Adolescent Behavioral Learning Experience Project for Incarcerated Youth". However, the"NYC Adolescent Behavioral Learning Experience Project for Incarcerated Youth" SIB was terminated after three years of service delivery when the recidivism rates of the first cohort showed no significant results compared with historical rates [59]. This led us to immediately understand that probably both weights and preferences suggested by #R1 and #R2 focussed on aspects (in the factors tree) that do not capture the possibility that SIBs may fail.
This aspect is particularly impressive. By going back to the respondents' preferences (see Section 4.3.2 and Table 3) we found that while #R1 and #R2 gave the same attention to the three categories ("Program Process", "Contractual Conditions", and "Evaluation"), #R3 showed a high preference for the first category. This probably means that #R3 did not consider this SIB as less uncertain due to the attention given to the "Program Process" category and thus did not consider the possibility that design deficiencies could affect the success of the entire project.

Discussion of Results
The results obtained in the analysis highlight the weakness of the factors tree initially developed by Scognamiglio et al. [5]. The analysis revealed the presence of a high number of variables that are not able to capture uncertainty in an SIB project.
Nevertheless, the comparison between our uncertainty scores and the main characteristics of failed SIBs let us detect the need for further variables. Among closed SIBs, the "NYC Adolescent Behavioral Learning Experience Project for Incarcerated Youth" has been considered by two respondents as an SIB with a lower level of uncertainty. The project was launched by Goldman Sachs Bank's Urban Investment Group (UIG) that announced a $9.6 million loan to support the delivery of the program to a predefined target population. The program was terminated after three years due to the lack of results. Similarly, the SIB "HMP Peterborough (The One Service)" was discontinued due to the "Transforming Rehabilitation" reforms to probation in the United Kingdom.
The "NYC Adolescent Behavioral Learning Experience Project for Incarcerated Youth" was the first SIB launched in the US and was guaranteed by Bloomberg Philanthropies, substantially reducing the downside risk. Conversely, the SIB "HMP Peterborough (The One Service)" was the first SIB launched worldwide and the first attempt to improve a funding mechanism for reducing recidivism among prisoners.
The ex-post analysis of these two SIBs led us to understand that further variables are required in order to capture uncertainty. In particular, further variables should consider the presence of guarantee schemes, the type of social issue addressed, and the voluntary or involuntary engagement of the target population by focusing not only on the contractual conditions of the SIB project but also on the "financial" and "social issue" characteristics.
Moreover, our analysis starts from the definition provided by Nicholls and Tomkinson [27], in which social uncertainty captures endogenous (e.g., organizational/responsiveness) and exogenous (e.g., political, economic, social, technological, and environmental) factors and in which both are negatively correlated to social return. In the case of SIBs, the higher the risk of change in the context in which the social program is delivered, the lower the social return that can be achieved. If the context of a social program changes-as in the case of Peterborough-it is more suitable to disconnect the entire program without further efforts by renouncing the possibility to achieve a lower level of social return (with the related reputational effects). Similarly, future projects in which there is a likelihood of contextual change will fail due to the fewer incentives related to their implementation. In the same way, in the context of endogenous factors, the likelihood that contracts or staff achieve less than optimal results will lead to changes or the termination of the contract before detrimental outcomes (including reputational ones).
In conclusion, the fact that no complete information about the remaining 32 closed SIBs is available does not help us to understand if our score is effectively able to capture their level of uncertainty or if their risk management processes have been able to capture the manifestation of their potential uncertainty factors early. Nevertheless, our analysis reveals that in order to capture the overall uncertainty related to an SIB, further risk factors should be considered by distinguishing among those strictly related to the contractual characteristics of the project (e.g., financial and social issues) and those related to endogenous/exogenous factors that have been not explored before.
These preliminary findings open new avenues for future research in the field of uncertainty and risk factors in the social impact bond landscape.

Concluding Remarks, Implications, and Future Research Directions
Our analysis focussed on social uncertainty, meant as the possibility that an SIB could not reach the intended impact or generate the expected social outcomes. The study was conducted in order to apply the model previously developed by Scognamiglio et al. [5] in a practical context by trying to improve it and to understand if and how it was able to reveal uncertainty in SIB projects. By using an "ex-post"approach and the population of closed SIBs, we performed a retrospective test of social uncertainty by considering a given set of variables, factors, and indicators. The results reveal a high number of non-relevant indicators. Further analyses are required in order to transform this retrospective approach into a predictive-and thus standardized-approach useful to evaluate ex-ante the uncertainty level of each new SIB project proposed.
Moreover, while several scholars had pointed out the main characteristics of SIBs, their enabling factors and the main opportunities that come from their implementations, it had remained unclear how to evaluate their level of uncertainty. We have attempted to address this knowledge gap through this work. At the same time, this work contributes to the field of alternative methodological approaches for sustainable finance. Under this perspective, we used an alternative methodological approach that cannot be brought back to the methods of mainstream finance in order to estimate the ability of SIBs to achieve outcomes. This implies not only a methodological shift related to the use of the AHP method-typically used in the social sciences-but also in the conceptual foundation of finance research typically oriented towards a more quantitative aspect and less interested in social purposes.
From a more practical point of view, this work could potentially contribute to the development of the entire SIB market by allowing both practitioners and policy makers to understand the areas from which uncertainty arises and thus helping the development of more focused risk management practices and more standardized SIBs schemes.
Although the findings are encouraging, there are a few limitations that need to be considered when interpreting the results of this study. The first limitation is related to the number of experts interviewed. More respondents would have allowed us to get more information and to achieve more comparisons by allowing us to refine the initial factor tree further. However, considering the exploratory nature of this study and especially our initial aims-testing the uncertainty model in a retrospective manner-this could not be considered as a point of weakness: some non-relevant indicators have been detected as needing further refinements. Future development of the model should consider this aspect as a starting point by considering the opportunity to improve the sample of respondents, and mixing qualitative and quantitative comparisons would help to clarify the validity of the model, also supporting the development of a more fine-tuned weighting scale to reduce subjective biases. The second limitation refers to the fact that we did not consider any interactions between indicators. The simultaneous presence of several elements could potentially reduce or increase the overall level of uncertainty. Finally, our analysis does not consider the presence of a "residual risk" that the model could not be able to capture and the possibility that uncertainty could not be explained without considering aspects such as opportunistic behavior and information asymmetries among the involved parties.

Acknowledgments:
The authors would like to express sincere gratitude for the valuable comments and suggestions received by the interviewed colleagues. The Authors would also like to thank the participants of the 3rd Social Impact Investments International Conference (5-6 December 2019, Rome) for their inspiring contributions. Finally, the authors are incredibly grateful to the three anonymous reviewers that have made the effort to review this work and have each contributed to its improvement during the peer-review phase.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A. The Sample
In this section the sample is described both in terms of socio-demographic and financial characteristics.
Appendix A.1. The Sample: An Overview Table A1 provides an overview in terms of "socio-demographic"data of the sample, with a focus on the investors involved in each SIB project. In addition, the first column shows the number we assigned to each SIB (and through which we identify each unit of our sample). The last column describes the structure of the SIBs, according to the Goodall [55] approach, which identifies three distinct types: (1) direct, (2) intermediated, and (3) managed.    Table A2 gives detailed publicly available information on the 34 completed SIBs analyzed in our work.  Source: our elaboration from publicly available information.

Appendix B. Mathematical Miscellanies
This section presents some mathematical definitions, theorems and procedures useful for the present study.

Appendix B.1. AHP Method
An n × n real matrix A = a ij is said to be positive if a ij > 0 for all i, j = 1, 2, ..., n; moreover, if a ij = 1 a ij for all i, j = 1, 2, ..., n, then A is said to be reciprocal. A positive reciprocal matrix A is said to be perfectly consistent (or consistent in Saaty's sense) if a ik a kj = a ij for all i, j, k = 1, 2, ..., n. Theorem A1 (Saaty [39]). For any A positive reciprocal matrix of size n there exists and is unique the largest (or principal) eigenvalue λ max of A, that is a number λ of maximum value able to solve the eigenvalue problem where w = (w 1 , w 2 , ..., w n ) is the right eigenvector associated with λ max . Furthermore, for λ max and w hold the following Corollary A1. Let A be a positive reciprocal matrix. If A is perfectly consistent then where w i and w j are respectively the i-th and j-th component of w.
Corollary A2. Any A positive reciprocal matrix of order 2 is perfectly consistent.
Any positive reciprocal matrix A with a ij ∈ N and a ij ≤ 9 is said to be a Pair Comparison Matrix (PCM) in an Analityc Herarchical Process. The entry a ij of A is intended as the judgment which the decision maker (DM) has taken when she compares the alternative i-th respect to the one j-th in a 1 to 9 scale (see Table A3). The right eigenvector w, characterized by (A1) and (A3), is said to be a priority vector or weight vector and represents the importance that DM assigns to various alternatives.
The Consistency Ratio is the number CR := CI /RI where CI := (λ max − n) /(n − 1) is the consistent index and RI is the random consistency index whose values (see Table A4) depend on n.
A PCM A is said to be sufficiently consistent if CR < 0.05 for n = 3, CR < 0.08 for n = 4 and CR < 0.1 for n ≥ 5. Table A3. Meaning of a ij in AHP. Interpretation   1 i and j are equally important 3

Value of a ij
i is slightly more important than j 5 i is more important than j 7 i is strongly more important than j 9 i is absolutely more important than j 1 /3 i is scarcely less important than j 1 /5 i is less important than j 1 /7 i is far less important than j 1 /9 i is absolutely less important than j

. Weak Composition
Let n ∈ N, p ∈ N * with p ≤ n. An ordered sequence of p nonegative integers λ i equal or inferior to n such that ∑ p i=1 λ i = n is said to be weak composition of n in p integers [60] and denoted by the symbol Λ(n, p). If all λ i are positive then Λ(n, p) is said to be composition of n in p integers.
Example A3. Given n = 3 and p = 2 then there exist 4 weak composition of 3 in 2 integers

. Transformation and Aggregation Functions
Let X = {x 1 , x 2 , ..., x n } be a real numerical batch. A function T : X → X that replaces each real numerical batch X = {x 1 , x 2 , ..., x n } ∈ X with the real numerical batch T(X) = (T(x 1 ), T(x 2 ), ..., T(x n )) ∈ X is said to be transformation. A transformation T : X → X is said to be linear if T(X) = a + bX for all a, b ∈ R, for all X ∈ X .
Example A4. The transformation where min(X) and max(X) are respectively the minimum and maximum value of X, is called min-max normalization. It is easy to check that it is a linear transformation and 0 ≤ T(x i ) ≤ 1 for all i = 1, 2, ..., n.
A bijective transformation T : X → X such that to distinct x i and x k of the same batch X correspond distinct values T(x i ) and T(x k ) is said to be scale of measurement for X and for all its transformed. In addition, if ordering T(x i ) from the smallest to the largest one T(x i ) − T(x i−1 ) = k for all i = 2, 3, ..., n, then the scale of measurement T is said to be at constant intervals.
A function F : X → R that replaces a batch X ∈ X with a real number is said to be aggregative function.
Let W(X) = (W(x 1 ), W(x 2 ), ..., W(x n )) be a transformed of the batch X such that 0 ≤ W(x i ) ≤ 1 for all i = 1, 2, ..., n and ∑ n i=1 W(x i ) = 1. The aggregative functions are said to be respectively mean of X and mean of X with weight W(X).

Appendix C. Details of the Used Approach
In this section the aggregation method used and the model are described in mathematical terms.

Appendix C.1. A Flexible Aggregation
The linear combination of categories, factors and composite indicators through the weights of the expert gives the final score. Such a process allows us to know also partial scorings, both in horizontal and vertical directions.
Consider, for example, a phenomenon R depending on the aspects N 1 and N 2 (e.g., categories); N 1 is explained by the sub-aspects N 11 and N 12 (e.g., factors), respectively measured through the indicators i 11 and i 12 ; N 2 is, in turn, measured by the indicator i 2 . Then, R is computed thus: and ⊕ is the composition among diagram trees. Figure A1 and Figure A2 show the computation of R choosing respectively the vertical or the horizontal direction. In the vertical procedure, R is given by the sum of the partial scores calculated independently from the others (see the steps 2, 3 of Figure A1 or equivalently the steps (A9), (A10)). In the horizontal case, R is, instead, given by the sum of the sums of the level-scorings (see the steps 3, 5 of Figure A2 or equivalently the steps (A13), (A15)).
As suggested by figures and formulas, the horizontal procedure saves computation time. Figure A1. Vertical direction of computation. Consider a number finite I of individuals (indexed by i ∈ I = {1, 2, ..., I}), later these will be considered as completed SIBs. Let X a I × N-matrix where the i-th row represents the information of individual i about all indicators and the n-th column represents the batch X n referred to n-th indicator. For convenience, each entry of X will be also labeled both in according to the level in which the indicator stays and to the group of aspects involved; for which, x where L ( ) = {1, 2, ..., − 1}. Agrregating (A16) by using (A17), the relative scoring is then where k is the overlooking level's investigated element which has among the elements of the -th level the k-th one, too. By summing of (A18), one obtains the final scoring, that is then given by Remark A1. The formula (A19) is even valid for a linear aggregation repeatably applied over each branch of the diagram tree. In this case, the score of each composite indicator will be multiplied by the weights (A17) that are found along the nodes of the branch (see Appendix C.1).

Remark A2. Given an indicator group G
( ) k , if all of the indicators x ink are treated on an equal range, then they have an equal weight given by w Then, the formula (A16) is the arithmetic mean of the normalized scores (see also formula (A7)).

Appendix D. Computation of Uncertainty Score for all SIBs Completed
In this section, we present the computation of final uncertainty scoring through some tables. In particular, Table A5 presents the normalized data and weights of the variables; Table A6 shows the objective aggregation to obtain the composite indicators; Table A7 presents the AHP-aggregation firstly made for the factors and secondly for the categories.