Multi-Dimensional Certification of Modern Distributed Systems

The cloud computing has deeply changed how distributed systems are engineered, leading to the proliferation of ever-evolving and complex environments, where legacy systems, microservices, and nanoservices coexist. These services can severely impact on individuals’ security and safety, introducing the need of solutions that properly assess and verify their correct behavior. Security assurance stands out as the way to address such pressing needs, with certification techniques being used to certify that a given service holds some non-functional properties. However, existing techniques build their evaluation on software artifacts only, falling short in providing a thorough evaluation of the non-functional properties under certification. In this paper, we present a multi-dimensional certification scheme where additional dimensions model relevant aspects (e.g., programming languages and development processes) that significantly contribute to the quality of the certification results. Our multi-dimensional certification enables a new generation of service selection approaches capable to handle a variety of user's requirements on the full system life cycle, from system development to its operation and maintenance. The performance and the quality of our approach are thoroughly evaluated in several experiments.


INTRODUCTION
M ODERN distributed systems are based on (micro)services developed using cloud-native technologies, composed at run time with orchestration platforms, and continuously monitored to ensure scalability and elasticity. At the same time, (micro)services coexist with legacy systems consisting of large and difficult-to-maintain codebases, on one side, and nanoservices consisting of few lines of codes, on the other side, substantially increasing systems' complexity [1]. This scenario radically changed the governance, risk, and compliance landscape, invading the safety and security of people. Distributed systems are in fact supporting virtually every transaction and process in our everyday life, calling for solutions properly assessing and verifying their behavior and, in turn, the impact on the individual's personal sphere.
In the last decade, security assurance has been widely accepted as a means to model and assess the behavior of a distributed system, to the aim of increasing its trustworthiness [2]. Assurance techniques have been consistently applied, with certification schemes adopted to drive the evaluation of non-functional (e.g., confidentiality, integrity) properties of a given target system (e.g., software, service) [3], [4], [5], [6]. Notwithstanding the continuous research on certification [2], existing techniques are still inadequate and rarely applied to modern service-based scenarios. On the one hand, traditional schemes (e.g., Common Criteria) lack of flexibility to properly tackle the challenges of service-based systems, such as automatic service selection [7], [8]. On the other hand, service-specific schemes (e.g., [4], [5], [6]) fail to evaluate the services in their entirety. They in fact build their evaluation on the final software artifacts only (i.e., executable service), ignoring additional and relevant aspects of the target service, such as the development process [8]. This negatively affects the whole certification process, impairing existing approaches in retrieving a detailed and realistic description of the service and, in turn, reducing the accuracy of the certification results. This also negatively influences the life cycle management of distributed systems, since certified services are selected and composed on the basis of certified yet incomplete information.
Our paper aims to fill in the above gaps by defining a novel certification scheme that expands the scope of certification beyond the simple evaluation of software artifacts. It considers additional aspects related to, for instance, development and verification processes. Our scheme groups these aspects into coherent dimensions, permitting to retrieve a certified, complete, and well-structured picture of the target service, paving the way for a fully-informed and safe decision-making. Our scheme supports a new wave of certification, where i) services are certified by considering each dimension independently; ii) each dimension can be managed according to its peculiarities (e.g., its life cycle); iii) non-functional properties are verified by integrating the certification results in each dimension. Our certification scheme is finally integrated within the distributed system life cycle management to support a certification-based service selection, where functionally-equivalent services are ranked and selected according to non-functional requirements defined by the users.
The contribution of our paper is twofold. We first define a multi-dimensional certification scheme, where the stateof-the-art evaluation of non-functional properties focusing on software artifacts only is extended to consider additional dimensions, modeling relevant aspects that contribute to the quality of the certification results. To this aim, the proposed scheme introduces a novel certification model as the collector of all activities driving a service certification, where all its building blocks (i.e., non-functional properties, target of certification, evidence) are based on dimensions to enable modular certification. Our scheme certifies the properties of a given target service composing the results of the evaluation in each dimension, and producing a more accurate certificate of the service behavior. We then implement a certification-based service selection process, where services are ranked according to the certified non-functional properties they hold and corresponding users' requirements, using a Multi-Criteria Decision-Making (MCDM) technique.
The remainder of this paper is organized as follows. Section 2 discusses the state-of-the-art and our motivations. Section 3 introduces our approach at a glance. Section 4 defines the certification model driving all certification activities, whose execution is described in Section 5. Section 6 details the methodology for ranking and selection of certified services. Section 7 presents an extensive experimental evaluation of the proposed certification scheme. Finally, Section 8 draws our conclusions.

BACKGROUND AND MOTIVATIONS
Certification schemes aim to verify whether a given system supports one or more non-functional properties and behaves as expected. Schemes are based on a certification model that specifies all the activities that have to be executed on the target of certification (e.g., a software or a service) to collect the evidence proving a given non-functional (e.g., confidentiality, integrity, reliability) property. If the collected evidence supports the given non-functional property, the certification scheme triggers the release of a certificate for the service, which is further used to comparatively select services according to non-functional requirements of end users. Certification schemes, in fact, have been commonly adopted to drive service ranking and selection according to certified non-functional properties and corresponding evidence in certificates [9].
The undebatable advantages brought by certification schemes conflict with some strong assumptions that limit their applicability and quality as follows.
A1) Benign behavior of all involved parties. Service providers, end-users, and certification authorities follow the certification scheme and its rules, resulting in certificates properly modeling the properties of the certified services. We note that some peculiar schemes relax this assumption [3], [26], [30], though they are out of the scope of this paper. A2) Chain of trust rooted at the CA. The CA is trusted by design, meaning that it correctly produces and executes certification models driving the certification activities. The trust is propagated from the CA to the certificates, and is guaranteed by cryptographic signatures [6]. A3) Certificates awarded to services according to their software artifacts only. The certification scheme builds its evaluation, and corresponding certificate award, on evidence collected by analyzing the software artifacts of the target of certification [3], [4], [6], [11]. In other words, evidence is collected by testing and monitoring the software artifacts of the target service, ignoring additional information coming from, for example, the development process. A4) Software artifacts sufficient for optimal selection.
Service ranking and selection are built on certified non-functional properties. The latter are evaluated according to corresponding evidence on software artifacts [9], [13], [16], [20], [21], [22], [23], [31] stored in certificates. Though partial, the evidence is assumed to be complete and support accurate ranking. Assumptions A1) and A2) limit the applicability of certification schemes to scenarios where all parties are benign and the chain of trust is built on a trusted party (i.e., the CA), while posing no limits to the quality of retrieved certificates. By contrast, assumption A3) reduces the quality of the certificates, by limiting the amount of evidence that can be collected. Services are in fact evaluated and certified on the basis of partial information describing their final software artifacts only, while discarding all evidence related to, for instance, how the software artifacts have been implemented, which can still provide relevant insights [8]. This lack of information degrades the quality of decisions based on certificates [32] directly impairing assumption A4), which requires complete information to retrieve an accurate ranking for optimal service selection.
As a result, the effectiveness and usefulness of current certification schemes is strongly impaired by assumptions A3) and A4); this results in scenarios where the selected services can exhibit a suboptimal behavior once provisioned, thus impacting on the users' trust in service providers.
The certification scheme in this paper addresses these gaps departing from assumptions A3) and A4). It extends the evaluation of target services beyond software artifacts (assumption A3)) at the basis of an accurate service selection (assumption A4)), by integrating relevant aspects of the target services influencing their certificates.

OUR APPROACH
Our reference scenario in Fig. 1 is a cloud environment where services are first certified and then selected according to their non-functional properties to ensure stable quality of service. It includes the following main parties: i) service provider that implements and distributes certified services; ii) end user that selects and integrates certified services within its system according to certified non-functional properties; ii) certification authority (CA) that defines and executes a certification scheme proving non-functional properties on services. Our certification scheme is based on the novel concept of dimension (Section 3.1), departing from the state-of-the-art and positively impacting on all the involved parties (Section 3.2).

Dimension-Based Certification
The certification scheme in this paper implements a flow of activities composed of three steps: i) certification model definition (Section 4 and step (1) in Fig. 1), where the CA describes the activities to be executed on the target of certification to collect evidence proving the support of a non-functional property; ii) certification model execution (Section 5.1 and step (2) in Fig. 1), where the CA executes, with the help of its verification labs, the activities in the certification model; iii) certificate award (Section 5.2 and step (3) in Fig. 1), where the CA awards a certificate proving a given non-functional property to the target of certification according to the collected evidence. End users finally select and compose services with certified behavior on the basis of their certificates and corresponding properties (Section 6 and step (4) in Fig. 1).
Our scheme radically changes the definition and execution of the certification model by modifying the definition of non-functional property in literature (e.g., [6]). Our definition of property, which is formalized in Section 4, models different aspects influencing the evaluation of the service behavior beyond simple software artifacts. New attributes (see Table 1) organized in dimensions are added, where each dimension describes a particular aspect of the non-functional property, as follows.
(a n ; v n Þg, where each attribute is a pair (a i ; v i Þ with a i the attribute name; v i 2 V a i the value for a i , denoted as a i :v i . We note that V a i is a totally ordered set according to total order relationship > a i defined by experts (e.g., the CA). > a i orders attribute values on the basis of their effect on the strength of the corresponding non-functional property, such that, for any pairs of attribute values a i :v j ; a i :v k 2V a i , a i :v j > a i a i :v k iff a i :v j increases the property strength more than a i :v k .
While being generic and extensible, the approach in this paper considers three dimensions as presented in Table 1: i) D art that includes attributes describing the software artifacts of the target (i.e., the attributes considered in the state-ofthe-art); ii) D dev that includes attributes describing the development process used to implement the target, such as attributes Prog. Lang., Dev. Proc., Type, State Mgmt., and Code Review; 1 iii) D eval that includes attributes describing the verification process at the basis of the target certification, such as attributes Trust. Contr. and When.

Impact and Reference Example
Let us consider a service directory with 5 functionally-equivalent services s 1 -s 5 , to be certified for property reliability, whose non-functional attributes and corresponding dimensions are shown in Table 1. For instance, service s 4 is a microservice operating in 3 replicas spread in 3 zones, indirectly managing the application state, and implemented following a DevSecOps methodology. Following our scheme, the CA certifies each service s 1 -s 5 for property reliability according to three dimensions D art , D dev , and D eval in the certification model (Section 4), whose execution (Section 5.1) triggers the release of the corresponding certificate (Section 5.2). Each certificate departs from assumption A3), expanding the service behavior beyond dimension D art , and supporting a certification scheme that is i) fine-grained, since it models service behavior according to several detailed attributes, ii) dimension-aware, since it organizes attributes in dimensions influencing the non-functional property [8]. Certificates are then matched and ranked against user's requirements to support service selection (Section 6). This addresses the shortcomings of assumption A4), grounding selection on more complete and accurate multi-dimensional information on service behavior.
The proposed scheme provides benefits for all the involved parties as follows. 1. We that note the development process has a substantial impact on the resulting software and, in turn, on the non-functional properties it holds [8], [33].
Service provider retrieves certificates better reflecting the behavior of its services [30], [32], [34]. For instance, the certificate of s 4 models the corresponding development process, which outperforms the development process of other services in Table 1, being built on a DevSecOps approach and microservices with code review. End user selects services according to more accurate certificates, supporting fully-informed and safe decisions. The accuracy of properties in certificates, as well as the structure and content of certificates are in fact fundamental in decision-making [32]. For instance, when evaluated according to D art , s 1 or s 4 are equivalent among them and clearly outperform the remaining ones. However, when considering the additional dimensions in our scheme, it is clear that s 1 is largely unacceptable and s 4 is the best solution. In fact, s 1 is a legacy service developed following a Waterfall process, has not been validated using code review, and directly manages application state, a practice which is not recommended. CA improves the quality and in turn the trustworthiness of its certification scheme, providing higher accuracy with a marginal increase in overhead.

CERTIFICATION MODEL DEFINITION
Our certification scheme defines a certification model detailing all activities required to certify a given target against a non-functional property (step (1) in Fig. 1).

Definition 2 (M). A certification model is a tuple of the form
p is the non-functional property in Definition 3; ToC is the target of certification in Definition 4; E is the evidence collection model in Definition 5; F is the evaluation function in Definition 7, determining the final outcome of the certification model execution.
The certification model follows the state of the art in Section 2 and specifies the non-functional property to be certified on a target service according to an evidence collection process. It is prepared and cryptographically signed by the certification authority, and trusted by service providers and end users according to the chain of trust in Section 2 [6]. Hereafter we detail the different components of the certification model.

Non-Functional Property and Target of Certification
A non-functional property p describes the non-functional behavior of a target of certification, as follows.

Definition 3 (p).
A non-functional property p is a pair (p; fD 1 ; . . . ; D n g), wherep is an abstract property (i.e., the property name) and D i is a dimension organizing non-functional attributes as described in Definition 1.
Similarly, the target of certification is defined as a set of mechanisms that are logically grouped according to dimensions, as follows.

Definition 4 (ToC). A target of certification ToC is a set
. . . ;u m g is a set of nonfunctional mechanisms u j describing the target according to dimension D i in Definition 1. A non-functional mechanism u j is a pair ðû; A u Þ, whereû is a mechanism type and A u is a set of values refining it [6].
We note that non-functional mechanisms are the means by which the target supports a non-functional property.
Example 1. Let us consider our reference example in Section 2. Property reliability is defined as p rel ¼ðp rel ; fD art ; D dev ; D eval gÞ. Attribute values are ordered according to > a i following their position in the corresponding definition. For instance, attribute values of attribute Prog. Lang. in D dev are ordered as [Rust > a pl Java > a pl Python]. The certification model for s 4 , defined as M s 4 , includes s 4 as ToC¼fQ D art ;Q D dev ; Q D eval g, where Q D art ¼fReplica Manager-¼Kubernetesg; Q D dev ¼fPipeline¼ File Content, Source Code¼Rust, Code Review Document¼File Contentg, and Q D eval ¼fCertification Framework¼ Trusted-and-Continuousg. More in detail, s 4 is deployed in Kubernetes, is written in Rust and has a code review document (Q D dev ), and is certified by means of a trustworthy and continuous certification framework (Q D eval ).

Evidence
The certification scheme collects evidence to prove that a target of certification holds a non-functional property. Evidence can be collected according to testing, monitoring, or formal proofs, and is bound to a subset of the ToC it insists on. For simplicity but no lack of generality, we consider test-based evidence, where the execution of testing activities permits to collect evidence on the service behavior. Evidence is collected according to an evidence collection model where E D is the evidence collection model detailing the testing activities in dimension D, as follows.
Definition 5 (E D ). An evidence collection model E D for dimension D is a set ffðu 1 ;t 1 Þ;. . . ;ðu n ;t n Þgg, where each fðu 1 ;t 1 Þ;. . . ; (u n ;t n Þg is a single test case. Each test case consists of several steps ðu i ;t i Þ, where u i 2Q D is the portion of the target the In other words, the evidence collection model E D is a sequence of test cases. Each test case verifies a specific (set of) non-functional mechanisms in Q D to prove a non-functional property in dimension D. It is a sequence of steps t i specifying all inputs, preconditions, and postconditions for its execution, as well as the expected output, at design time [6]. The result of the execution of an evidence collection model E D is a set of evidence fevg D defined as follows.
Definition 6 (ev). An evidence ev is a set fðto 1 ; tr 1 Þ; . . . ; ðto n ; tr n Þg describing the result of the execution of a single test case in E D . It consists of several pairs ðto i ; tr i Þ, where to i is the output of the execution of test step t i , and tr i is either Success or Failure indicating whether the execution of t i is successful, that is, output to i matches the expected output in t i .
Evidence collection model E D and the corresponding set of evidence fevg D provide the trust anchor of our scheme, binding certificates on concrete evidence retrieved from the execution of the test cases against the targets.
Example 2. Following Example 1, we present an excerpt of the evidence collection model M s 4 :E D art ¼ff(Replica Manag-er¼Kubernetes, Get-Orchestrator), Replica Manager¼Kubernetes, Check-Replicas), Replica Manager¼Kubernetes, Check-Zones)gg. It contains one test case consisting of three steps insisting on the same mechanism Replica Manager. It first retrieves the orchestrator checking whether it is a HA/ enabled Kubernetes cluster (step "Get-Orchestrator"); it then verifies the number of replicas checking whether it is compatible with the expected number of replicas (step "Check-Replicas"); it finally verifies the number of zones (e.g., data centers) where the replicas are spread checking whether it is compatible with the expected number of zones (step "Check-Zones"). We note that the details of each step (e.g., preconditions, inputs) are omitted for brevity. Table 2 shows the complete M s 4 :E.

Evaluation Function
The last component of the certification model is evaluation function F . It determines the outcome (success or failure) of evidence collection and, if positive, enables certificate release. It follows the concept of dimension and is modeled as a sequence of Boolean functions, each retrieving the outcome of evidence collection in a specific dimension. Functions are then combined using a Boolean operator AND as follows.
a function returning the outcome of the certification model execution within a dimension D i according to the collected evidence fevg D i .
According to the dimensions in Section 3, F is defined as F D art^F D dev^F D eval . Each function takes value: i) >, in case of success, allowing certificate release, or ii) ?, in case of failure, preventing certificate release. We note that the evaluation function can support complex rules determining the result of the certification, beyond the "all-or-nothing" in this paper where all evidence must be successfully collected or all dimensions must be successfully certified.

CERTIFICATION MODEL EXECUTION AND CERTIFICATE AWARD
Our certification scheme executes the certification model in Section 4 and, if successful, awards a certificate to the ToC (steps (2) and (3) in Fig. 1). The soundness of the entire scheme is built on the well-formedness of the certification model defined by the CA.
The above definition requires that, for each mechanism u i forming the ToC, there exists at least one test step ðu j , t j Þ in the sequence of test cases M:E verifying u i . In other words, each mechanism in the target of certification must be verified by the test cases defined in the certification model.
In the following of this section, we present certification model execution (Section 5.1) and certificate award (Section 5.2), completing our certification scheme. The pseudocode of these steps is reported in Fig. 2.   According to the three considered dimensions in Section 3, three views are induced on the certification model: V art , V dev , and V eval . Each view is evaluated in two steps as follows:

Certification Model Execution
1) Evidence Collection. The certification process collects evidence fevg D by executing the test cases in V:E D against the portion Q D of ToC. 2) Individual View Evaluation. Evaluation function F D 2fF D art ; F D dev ; F D eval g determines the result (> or ?) of evidence collection at step 1) in the corresponding dimension D art ; D dev ; D eval . In particular, F D is a Boolean expression over the collected evidence, requiring each evidence ev2fevg D to be successful (denoted as SuccðevÞ¼>). An evidence is successful if all the test steps therein return Success (Definition 6). In other words, evaluation function   Table 2 (step (1)). All evidence ev D art is successful, therefore the view is evaluated > (step (2)).

Certificate Award
A successful certification model execution triggers the release of a certificate. It includes three main components: i) the certification model, ii) the set of collected evidence, iii) a set of test metrics describing the evidence collection performance. A certificate is formally defined as follows. Metrics fðm i ; v i Þg describe the performance of the evidence collection model, where m i is a metric class and v i its normalized value; notation m i :v denotes the value of metric m i . Table 3 reports the considered metrics.
We note that the process for certificate awarding can be replicated by executing E, and its correctness evaluated against E; F ; and fevg.    Table 4. For instance, test case fCheck-Codeg, checking whether the used programming language is Rust, returned as output fRustg and therefore fSuccessg. For brevity, Table 4 only reports the name of the test cases. fðInput Partition Coverage, 0.977), (Branch Coverage, 0:876Þg is a subset of the metrics in Table 3.

SERVICE SELECTION
The certification model execution in Section 5 enables end users to select certified services according to their specific non-functional needs, increasing the trustworthiness of their systems. We assume certified services to be functionally equivalent (i.e., offer the same functionality) and match the functional users' requirements (step (4) in Fig. 1). The service selection process builds on dimension lattices as the means to specify users' requirements on services and rank services according to their certificates. Each lattice is induced by a dimension of a non-functional property, and is defined as follows.
Definition 11 (ðD; # p Þ). Let D be a dimension and p a nonfunctional property. The dimension lattice is a pair ðD; # p Þ, where D ¼ V a 1 Â . . . Â V an , with V a i being the domain of attribute a i (Definition 1); # p is a partial order relationship over D such that for each pair of elements (dimensions) D i ; D j 2D, D i # p D j iff for all attributes a k 2D i ; D j either D i :a k :v > a k D j :a k :v or D i :a k :v¼D j :a k :v.
In other words a dimension lattice contains all the possible dimensions organized according to a partial order (dominance) relationship. We note that the empty value fg models either an attribute not having a value or an attribute whose value is unknown. Such a value is ranked last in the total order relationship in Definition 1, that is, 8v j 2V a k , v j > a k fg. We also note that the total order relationships > a k at the basis of the lattice are defined by experts (i.e., the CA, see Definition 1).
A ranking function R:ðD; # p Þ![0,1] assigns a value to each element D of the lattice according to the following condition: 8D i ; D j 2 ðD; # p Þ, RðD i Þ ! RðD j Þ iff D i # p D j , that is, the ranking function is compatible with the lattice ordering. We consider a standard ranking function defined as follows.

Definition 12 (R). The ranking function is defined as
RðDÞ¼ LðDÞþ1 n , where D is a lattice element, LðDÞ is the function returning the number of arcs of the minimum path from the least element to D in the corresponding Hasse diagram of the lattice, and n¼maxðLðDÞÞþ1. Example 6. Fig. 5a shows an example of lattice ðD art ; # rel Þ for dimension artifact D art and property reliability. For brevity, we only consider attributes Replicas and HA Prot. (the former denoted as "Repl." in Fig. 5a). We note that

Input Partition Coverage
The degree to which test cases cover the partitions of the service inputs. Branch Coverage The degree to which test cases cover the branches of the service.

Condition Coverage
The degree to which test cases cover the conditions of the service.

Path Coverage
The degree to which test cases cover the possible linearly independent paths of the service [37]. relationships > Repl. and > HA Prot. are defined by the CA. The least element of the lattice is the worst one, corresponding to RðDÞ¼0:16. It refers to a service whose attributes are unknown/cannot be certified. The top element of the lattice is the optimum, corresponding to RðDÞ¼1. It refers to a service with 3 replicas and a managed HA protocol. We note that elements located at the same lattice levels are equivalent and associated with the same value of ranking function, for instance, RðDÞ¼0.5 for level 2, and RðDÞ¼0:83 for level 4.
Service selection consists of three activities: i) service filtering, which collects compatible services matching users' requirements; ii) service ranking, which ranks compatible services according to the property they hold; iii) totallyordered service ranking, which provides a totally-ordered ranking of services according to certificate metrics in Definition 10. Fig. 4 shows the pseudocode of these steps, while Figs. 5a, 5b, 5c, and 5d shows an example based on services s 1 -s 5 in Table 1.

Service Filtering
Service filtering receives as input a set of certified services and user's requirements, and returns as output a subset of compatible services addressing the requirements. Given a non-functional property p, a requirement insisting on certified services holding p is a set of the form fðglb; lubÞ 1 ; . . . ;  Table 1. ðglb; lubÞ n g, where each ðglb; lubÞ i defines the greatest lower bound (glb i ) and the least upper bound (lub i ) on the lattice induced by dimension D i of p.
Service filtering operates on each dimension independently, then combines the results as follows.
Definition 13 (Service Filtering). Let fs 1 ; . . . ; s n g be a set of services, fC 1 ; . . . ; C n g the set of corresponding certificates for property p, and fðglb; lubÞ i g user's requirements for each dimension D i . The result of service filtering is the set fs k j 8i, lub i # p C k :p:D i # p glb i g of compatible services.
In other words, service filtering returns the set of compatible services satisfying users' requirements in all dimensions and lattices, according to partial order relationship # p in Definition 11. Example 7. Following Example 6, Fig. 5b shows an example of service filtering. For brevity, we report only dimension D art . User requirement fðRepl¼2, HA Prot.¼Custom), ðRepl¼3, HA Prot.¼ManagedÞg indicates glb and lub in Definition 13, resp., depicted as grey-filled nodes in Fig. 5b. Services whose non-functional property is within glb and lub are kept for the following activities, namely s 1 ; s 2 ; s 4 ; s 5 , while s 3 , having a property dominated by glb, is filtered out.

Service Ranking
Service ranking ranks compatible services according to the non-functional property they hold. This activity is modeled as a Multi-Criteria Decision-Making (MCDM) problem, ranking n alternatives (the certified services) according to m (weighted) criteria (the dimensions forming the property in terms of ranking function in Definition 12). Each criterion is associated with a weight reflecting the importance thereof in the ranking. In our case, each dimension D i is associated with a weight W ½i 2 ½0; 1.
We use VIKOR [38] to find a compromise solution, that is, the closest to the ideal among conflicting criteria. It receives as input the set fs 1 ; . . . ; s n g of compatible services in Definition 13, the corresponding set of certificates fC 1 ; . . . ; C n g and a vector W of weights, and returns as output a ranking of the services, identifying the best services in terms of their non-functional property. The weights of the vector must sum to 1, formally P jW j i¼0 W ½i¼1. VIKOR-based service ranking is a 5-steps process working as follows.
Step 1. It takes as input the set fs j g of services, the corresponding set of certificates fC j g and the weight vector W , and computes the positive-ideal Eq. (1) and the negative-ideal Eq. (2) alternative for all dimensions D 1 ;. . .; D m , as follows.
represents the best (worst, resp.) certified service in terms of the non-functional property in the i-th dimension.
Step 2. It computes the group maximum utility U j and the minimum individual regret Z j of a certified service j, according to the following L p -metric.
where 1 p 1. L p;j is the (normalized) distance between service j and the positive ideal service. Based on such a metric, we compute group maximum utility U j Eq. (4) and minimum individual regret Z j Eq. (5) of a certified service j as follows.
Step 3. It computes the sorting index Q j for each service j using Eq. (6).
Step 4. It ranks the services according to values U , Z , Q of corresponding certificates in ascending order, producing three ranking lists.
Step 5. It proposes the best compromise service having the lowest value according to index Q, denoted as Q 0 , if the following conditions are met: 1) Q 0 provides an acceptable advantage over the second lowest value in Q, denoted as Q 00 , that is, Q 00 ÀQ 0 !1=ðn À 1Þ, with n being the number of alternatives (services). 2) Q 0 provides an acceptable stability in the decision making, that is, Q 0 is ranked the best also by U and/or Z . If condition 1) is not satisfied, the set of compromise solutions consists of Q; 0 Q; 00 . . . ;Q ðkÞ , where Q ðkÞ is given by the relation Q ðkÞ ÀQ 0 < 1=ðn À 1Þ, for maximum k. In other words, it returns all the solutions laying in the acceptable advantage interval.
If only condition 2) is not satisfied, the set of compromise solutions consists of Q 0 and Q 00 , since there is no decisionmaking stability and Q; 0 Q 00 represent the same compromise.
Example 8. Following Example 7, Fig. 5c shows an example of service ranking. Values of ranking function R are taken according to the lattices of the three dimensions and the values in Table 1. Service s 4 is the only compromise solution, having the lowest value for index Q and both conditions satisfied. In particular, condition 1) is satisfied because Q 2 À Q 4 ! 1=ðn À 1Þ, as 0.986!0:3. We note that services s 2 and s 5 are equivalent, that is, Q 2 ¼Q 5 , and at the same ranking position.

Totally-Ordered Service Ranking
The index Q, along with the results of conditions 1) and 2), are not enough to compute a total ordering of certified services. The likelihood of having different services with the same ranking is in fact not negligible and, when condition 1) does not hold, can result in all services tied for first. The reason lies in how a dimension is mapped into a number given as input to VIKOR. In our case, it is computed by ranking function R in Definition 12, and only depends on the position of the dimension in the lattice, calculated as the distance between the current dimension and the least dimension. Hence, even on lattices with a large number of elements, the number of possible outputs of R is typically small. To this aim, we follow the approach in [35] and further compare equivalent services using metrics in Definition 10. Metrics express the strength of the collected evidence supporting the certified non-functional property, and in turn the strength of the certificate.
The totally-ordered ranking is retrieved according to a function that takes as input i) the ranking fðs j ; Q j Þg returned by VIKOR (Section 6.2), where s j is a compatible service with its certificate C j and Q j the corresponding VIKOR index; ii) a vector of weights W m 2½0; 1 expressing the importance of the metrics, such that P jWmj i¼0 W m ½i¼1, with jW m j the number of metrics. We note that, for simplicity, a set of predefined vectors can be used. The function producing the totally-ordered ranking works as follows.
Step 1. It takes as input the ranking list and the weight vector W m , and produces as output the strength m j of each certificate C j , according to Eq. (7).
In other words, certificate strength m j is computed as a weighted sum of the metrics contained in the certificate.
Step 2. It takes as input the certificate strengths and the ranking list, and returns as output a total ordering, computed by refining the VIKOR ordering according to the certificate strength. Formally, a service s i is ranked higher (is better) than a service s j iff one of the following holds.
1) states that s i is better than s j if VIKOR index Q i is lower. In this case, no additional sorting is needed. Condition 2) states that, when the VIKOR index of the two services is equal, certificate strength is used to provide a total order. In other words, services are ranked according to their VIKOR index first and, in case they are equal, to their certificate strength. If there are services still ranked at the same position, random sort is used.
Example 9. Following Example 8, Fig. 5d shows an example of totally-ordered ranking. For brevity, we consider only metrics Input Partition Coverage and Branch Coverage in Table 4, abbreviated as IPC and BC, resp. Totallyordered ranking disambiguates between services having the same VIKOR index, namely s 2 and s 5 , according to certificate strength m. The resulting totally-ordered ranking is therefore s 4 ; s 5 ; s 2 ; s 1 .

EXPERIMENTS
We experimentally evaluated the performance and quality of the proposed approach in a simulated environment. Experiments have been run on a laptop equipped with an Intel Ò Core i7-5500 U @ 2.4 GHz (2 cores, 4 threads), 16 GBs of RAM, operating system Ubuntu 20.04 x64, Java runtime OpenJDK 11.0.10, Python runtime 3.8.6. We compared our approach with stateof-the-art certification schemes (e.g., [6], [11]) that, according to assumptions A3) and A4) in Section 2, evaluate software artifacts only. However, since source code, as well as precise modeling, of existing solutions are generally not available, we compared our solution with an approximation of the state-ofthe-art, instantiating our scheme on dimension D art only.
We evaluated our scheme in terms of i) performance, measuring the execution time of its phases (Section 7.1); and ii) quality, comparing the result of service selection against the state-of-the-art and the global optimum (Section 7.2).

Performance
We evaluated the performance of our approach by running two experiments measuring the execution time of lattice building and service ranking.
The first experiment measured the time needed to construct the data structure holding the dimension lattice in Definition 11, using jhpl, an optimized Java library modeling lattices as sets of tries. 2 We generated lattices with a large number of elements varying the number of attributes in 1, 3, 6, 9 and the number of possible values of each attribute in 15, 25, 50, 75, 100. Figs. 6a and 6b show that the time needed for lattice building is negligible, never exceeding 0.025 milliseconds.
The second experiment measured the time needed to perform service ranking in Section 6.2. We implemented VIKOR on top of a Python library optimized for row-and column-wise operations, 3 and varied the number of services in 100, 300, 600, 900, 1200, 1500, 1800 and dimensions in 1, 3,6,9,12. Figs. 6c and 6d show that the performance depends only on the number of services involved, while the impact of the number of dimensions is negligible. The reason is that the time complexity is Oðjservicesj Á jdimensionsjÞ % OðjservicesjÞ, because the number of services is typically one or more orders of magnitude larger than the number of dimensions. Even in a worst case scenario, the execution time is very low, never exceeding 0.6 seconds. We note that our experiments did not measure the time for building a totally-ordered service ranking in Section 6.3, since its cost is the one of a sorting algorithm and therefore well-known, that is, Oðjservicesj Á log jservicesjÞ.

Quality
We evaluated the quality of our multi-dimensional service ranking with respect to two approaches: i) global optimum, and ii) state-of-the-art. The optimum approach is a manual approach retrieving the service with highest quality, while the state-of-the-art approach only considers dimension artifacts D art . Quality evaluation analyzes i) the cumulative penalty introduced by each dimension with respect to the optimum approach (Section 7.2.1), ii) the similarities between the optimum ranking and the other approaches (Section 7.2.2).
2. https://github.com/prasser/jhpl 3. https://pandas.pydata.org/ Table 5 presents the experimental settings varying the number of dimensions in 3, 6, 9. For each setting, we defined 5 different configurations in the form of weights rating the importance of each single dimension. In particular, we considered different classes of weights: i) increasing weights, where each dimension i is more important than dimension iÀ1 (C3:1, C6:1, C9:1 in Table 5); ii) balanced weights, where all dimensions have the same importance (C3:2, C6:2, C9:2 in Table 5); iii) fewprevailing weights, where three dimensions with similar weights have more importance (C3:3, C6:3, C9:3 in Table 5); iv) decreasing weights, where each dimension i is less important than dimension iÀ1 (C3:4, C6:4, C9:4 in Table 5); v) unbalanced weights, where one dimension has much higher importance (C3:5, C6:5, C9:5 in Table 5). We then randomly generated 38,400 certificates, covering the entire domain of our ranking function in Definition 12. For each configuration, we randomly split certificates in 30 data sets and i) manually calculated the optimum ranking, ii) executed our service ranking based on VIKOR, using the consensus decision threshold v¼0.5 (see Step 3 in Section 6.2), and iii) executed the state-of-the-art ranking, where services have been ranked according to dimension D art , that is, the first dimension in each setting. We finally averaged results retrieved according to each configuration. We note that, for simplicity, we assumed service filtering in Section 6.1 to return all certified services; this choice does not impact on the quality of our experiments as it applies to all approaches (global optimum, our approach, state-of-the-art).

Captured Quality
A penalty metric P ðsÞ is first defined evaluating the degree to which a service s (and its certificate) diverges from the global optimum as follows.
where maxðD i Þ is the highest value for the i-th dimension among all certified services and R(C:M:p:D i ) is the value of the same dimension for the service under evaluation, both according to the ranking function in Definition 12. We note that penalty P ðsÞ is the sum of the normalized penalties contributed by each dimension. The quality QUðsÞ of a service s is then defined as follows.
QUðsÞ ¼ 1 À P ðsÞ À minðP Þ maxðP Þ À minðP Þ where minðP Þ and maxðP Þ are the minimum and maximum penalties among all certified services, resp. QUðsÞ¼0 is obtained when s has the lowest quality among all services, QUðsÞ¼1 is obtained when s has the highest quality. We retrieved the 10 best services in each approach as follows: i) global optimum: the 10 services with highest quality; ii) our scheme: the 10 best services according to service ranking; iii) state-of-the-art: the 10 services with highest value of ranking function in dimension D art . We then calculated the average quality of the 10 selected services in each individual  data set and configuration. Table 6(a) and Figs. 7a, 7b, and 7c summarize our results. Our approach captures 92% of the quality of the optimum approach on average, that is, AvgðQUðsÞÞ¼92%, compared to 72% of the state-of-the-art. As expected, the only settings where the quality of the state-of-the-art is close to the quality of our approach are those where dimension D art has (much) higher importance (i.e., C3:5, C6:5, C9:5). Furthermore, our approach provides a consistent quality of 92% on average in all configurations, while the quality of the state-of-the-art approach decreases from 76% (3 dimensions) to 72% (6 dimensions) and 69% (9 dimensions) on average. Table 6(b) and Figs. 7d, 7e, and 7f show the number of times, in percentage, when the first-ranked service of our and state-of-theart approaches is the optimum one. Our approach retrieves the optimum in 73% of the cases, while the stateof-the-art approach in only 13% of the cases. Even when the optimum is not reached, our approach always provides a remarkable quality !85%, that, in most (98%) of the cases, is !90%.
A second important aspect is the contribution that each dimension gives to the penalty, that is, the distribution of the penalty among the dimensions. To this aim, we computed the highest normalized contribution to penalty of a service s according to the following Equation.   6(c) and Figs. 7g, 7h, and 7i show our results. In our approach, the highest contribution to penalty is 35% on average, compared to 39% of the global optimum and 41% of the state-of-the-art. This means that our ranking favors balanced services, reducing scenarios in which selected services show high variance in dimension penalty.

Ranking Evaluation
We finally compared the ranking produced by our approach with respect to state-of-the-art and global optimum according to two metrics measuring distances between ranking lists: i) Kendall's t distance [39] and ii) Spearman's footrule distance [40]. We used the experimental configurations adopted in Section 7.2.1 with totallyordered service ranking, using metrics in Table 4 with random values and equally-distributed weights. This way, metrics values are given randomly both to high-and lowquality services, and certificate strength in Section 6.3 is entirely computed on random values. The Kendall's t distance counts the number of pairwise disagreements between two ranking lists. Let s q be the global optimum ranking list, s v the VIKOR-based ranking list, an s s the state-of-the-art ranking list, and sðiÞ the notation indicating the rank of element i in a given ranking list s. Kendall's t distance of rankings s v , s s with respect to s q is defined as Kðs y ; s q Þ ¼ Kðs y Þ ¼ X n ði;jÞ:i < j invði; jÞ (11) where y2fv; sg and invði; jÞ returns 1 if ðs q ðiÞ < s y ðjÞŝ q ðiÞ > s y ðjÞÞ _ ðs q ðiÞ > s y ðjÞ^s q ðiÞ < s y ðjÞÞ, 0 otherwise. Such a distance can be normalized in the range ½0; 1 by dividing by the highest possible value nðnÀ1Þ n . The Spearman's footrule distance measures the total displacement between s v , s s and s q , and is defined as Sðs y ; s q Þ ¼ Sðs y Þ ¼ X n i js q ðiÞ À s y ðiÞj (12) where y2fv; sg. Such a distance can be normalized in the range ½0; 1 by dividing by the highest possible value n 2 2 . Table 7 summarizes our results. In all cases, our ranking s v outperformed the state of art. The number of pairwise disagreements Kðs v Þ and the total displacement Sðs v Þ are, on average, %3 times less than state-the-of-art Kðs s Þ and Sðs v Þ. More in detail, s v requires to reorder only 10% of all pairs, compared to 29% of the state of art; it also shows a total displacement of 15%, compared to 40% of the state of art. Therefore, our approach improves the state-of-the-art producing a ranking that, in terms of the aforementioned distances, is similar to a global optimum-based approach.
To conclude, our experiments show that the proposed ranking has a good quality. It outperformed the state-ofthe-art, capturing 92% of the quality of the global optimum in all settings (Table 6(a)), while better approaching the global optimum-based ranking, almost six times more than the state-of-the-art (Table 6(b)). Furthermore, our approach has the lowest value of the highest normalized contribution to penalty (Table 6(c)). This means that it favors balanced services, guaranteeing good quality and low variation among all dimensions.

CONCLUSION
An important goal of the evolution of ICT is to combine the opportunities provided by modern distributed systems composed of several services in terms of efficiency, flexibility and added-value applications, with an adequate level of trustworthiness over system behavior. The approach in this paper provides a concrete solution towards this goal, defining a novel certification scheme that goes beyond the simple evaluation of a service behavior, and considers additional factors describing, for instance, how the service has been implemented and verified, to increase the quality and performance of a distributed system. The proposed scheme enables a multi-dimensional certification approach based on a novel and fine-grained definition of non-functional properties, where the certification execution is split in different and logically-separated domains called dimensions. This modeling leads to more accurate certificates and, consequently, more accurate decision-making when services/software are dynamically selected at run time on the basis of their certificates. Our approach provides benefits for all the involved parties: service providers obtain certificates better reflecting all the best practices they followed; end users take decisions on more detailed and well-structured certificates; the CA provides a more useful certification scheme.