A Methodology and Simulation-Based Toolchain for Estimating Deployment Performance of Smart Collective Services at the Edge

Research trends are pushing artificial intelligence (AI) across the Internet of Things (IoT)–edge–fog–cloud continuum to enable effective data analytics, decision making, as well as the efficient use of resources for QoS targets. Approaches for collective adaptive systems (CASs) engineering, such as aggregate computing, provide declarative programming models and tools for dealing with the uncertainty and the complexity that may arise from scale, heterogeneity, and dynamicity. Crucially, aggregate computing architecture allows for “pulverization”: applications can be decomposed into many deployable micromodules that can be spread across the ICT infrastructure, thus allowing multiple potential deployment configurations for the same application logic. This article studies the deployment architecture of aggregate-based edge services and its implications in terms of performance and cost. The goal is to provide methodological guidelines and a model-based toolchain for the generation and simulation-based evaluation of potential deployments. First, we address this subject methodologically by proposing an approach based on deployment code generators and a simulation phase whose obtained solutions are assessed with respect to their performance and costs. We then tailor this approach to aggregate computing applications deployed onto an IoT–edge–fog–cloud infrastructure, and we develop a corresponding toolchain based on Protelis and EdgeCloudSim. Finally, we evaluate the approach and tools through a case study of edge multimedia streaming, where the edge ecosystem exhibits intelligence by self-organizing into clusters to promote load balancing in large-scale dynamic settings.

and intelligence at the edge of the network, close to Internet of Things (IoT) devices and users. So, it benefits communication latency and data rate and supports scalability through decentralization and locality. Enriching the edge with artificial intelligence (AI) capabilities [1]- [3] can be vital to unlocking the potential of the IoT, enabling large-scale data processing and reactivity in decision making. However, edge ecosystems tend to be complex due to the heterogeneity of participating devices and the high dynamicity of relationships and goals (as induced, e.g., by mobility, failure, environmental changes, and user activity). Prominent issues include defining efficient edge structures [4], coordinating edge resource providers and consumers [5], and supporting decision making for reconfiguration and load balancing [6].
Recently, the aggregate paradigm has proved valuable as an approach for engineering opportunistic services in IoT and EC scenarios [7]- [9] and for programming collective edge intelligence [3], [6]. A key benefit of aggregate computing is that its architecture allows pulverizing (i.e., finely partitioning) applications (which we call aggregate applications or aggregate systems) into several logical components and deployment units [10]. These units can be spread to available infrastructure for defining a particular deployment configuration. This flexibility in deployment is critical to fully exploiting the IoT-edge-fog-cloud infrastructural continuum opportunities.
This work focuses on predicting how the deployment affects the performance and costs of smart edge services expressed as aggregate applications. This is a significant issue since suboptimal deployments can negatively affect performance (e.g., system reactivity to change due to latencies, or unavailability caused by energy depletion) and costs (e.g., in terms of network capacity or energy consumption) associated with these services. Therefore, for an effective engineering process of complex systems [11], it is crucial to evaluate and compare different target deployment configurations to mitigate the risk of ineffective deployments and reconfigurations (which may cause further costs and temporary QoS reduction). This is a problem of methods and tools, which should guide and support engineers across the various engineering phases. Therefore, this article delineates a methodology and presents tools for assessing aggregate application deployments through simulation. Most specifically, we provide the following contributions.
1) We propose a methodology applicable to pulverizable (partitionable) systems, which leverages deployment This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ generators and simulators for assessing performance and costs associated with a set of target deployment configurations. 2) We implement the methodology specifically for aggregate computing applications by mapping aggregate specifications to possible deployment configurations in a way reminiscent of Infrastructure as Code (IaC) [12]. 3) We develop a toolchain, integrating the Protelis aggregate programming engine [13] with the EdgeCloudSim simulator [14] for measuring metrics related to edgecloud deployments. 4) We apply the approach and toolchain to the edge multimedia streaming case study presented in [6], which represents a large-scale scenario where edge intelligence is exploited for balancing the load on edge servers through dynamic clustering. The remainder of the article is organized as follows. In Section II, we report an account of state-of-the-art approaches for engineering and simulation of collective adaptive systems (CASs) and provide background on the pulverizable architecture of aggregate computing applications. In Section III, we present our methodology and toolchain. In Section IV, we provide a quantitative evaluation of the approach. Finally, Section V provides conclusive thoughts and discusses significant directions for future work.

A. Collective Adaptive Systems Development
In this section, we review the state of the art on CAS engineering, which aims to analyze and design the emergent behavior of large-scale situated cyber-physical systems. Example applications include crowd engineering for safe navigation and dispersal [15], smart mobility [16], situated problem solving [9], trust and reputation systems [17], robotics [18], [19], and resilient management of ICT infrastructures [20]. A recent survey of models, methods, and tools for rigorous CAS engineering can be found in [21]. In the following, we review programming and system specification approaches supporting the development of CAS-based applications.
Techniques for CAS programming generally leverage one or more of the following abstractions: ensembles, namely, dynamic collections of devices; collective communication interfaces, namely, abstractions enabling individual components to communicate with groups; and field-like data structures, namely, mechanisms to address data belonging to an entire group of components. Other approaches inspiring CAS development techniques can be found among macroprogramming and spatial computing contributions, as surveyed in [22]. In the following, we examine an approach to CAS programming based on the computational field abstraction, deriving from the spatial computing and coordination tradition [23].
1) Aggregate Computing: Aggregate computing [15] is a full-fledged paradigm for CAS engineering. It formally founds on field-based coordination [23] to compositionally express collective adaptive behavior from a global perspective. Field-based coordination is captured by field calculi [23] and implemented through languages, such as the standalone Protelis [13] and the Scala-internal ScaFi [24]. In field calculi, the whole behavior of the system is specified in terms of expressions manipulating computational fields, namely, maps from system components to values. So, for instance, querying a temperature sensor across a network yields a field of floating-point values denoting temperatures, checking whether the mean temperature across the neighbor's contributions exceeds a threshold yields a field of Boolean values denoting potentially critical conditions, and so on.
From an operational perspective, each component (or device) belonging to an aggregate system has some middleware support for collaborating to the aggregate application, which consists of an aggregate program plus configuration related to connectivity and scheduling. From a logical point of view, each device operates in computational rounds, each of which is composed of the following steps.
1) Context Acquisition: Sensor information, messages from neighbors, and state information are collected. 2) Computation: The aggregate program is applied against the local context, yielding an output value and an outbound coordination message. 3) Actuation: The output value can be used to drive actuators. 4) Coordination: The outbound coordination message is expected to be broadcasted to all the neighbors of the device. Rounds of different devices may be asynchronous. This execution protocol is independent of the concrete aggregate program: different programs generate different outputs, and, consequently, different messages for neighbors. The more complex the program, the larger are the messages to be exchanged with neighbors. Note that though the aggregate program is the same for all the devices, the individual behaviors would generally be different, as it would be evaluated against a different context.
The complete coverage of the theory and practice of aggregate computing is beyond the scope of this article: the interested reader can refer to [23] for more details on the approach and the main formal properties. For the sake of this article and to provide a clue about how the aggregate paradigm works, let us consider a simple but paradigmatic example: the self-healing channel. This is a distributed, dynamic structure in a network that provides a hop-by-hop path from a source device to a target device, represented as a Boolean field mapping the devices comprised in the channel to true and the devices outside the channel to false. This channel can be programmed by composing other functions leveraging gradients [25] to estimate 1) the distance from any node to the source; 2) the distance from any node to the target; and 3) the source-to-target distance. With these three pieces of information available locally, each device can determine whether it belongs to the channel or not by exploiting simple geometry (the triangle inequality). Moreover, the above functions can be reused for other algorithms as well. Crucially, regarding the dynamics, note that as devices move in the network, the fields from which the channel is computed self-adjust, eventually converging to their correct value, and the same happens for the channel as well. We remark that such a "global" computation (a channel spanning a network) is executed in a fully decentralized way, in terms of local sensing (distance to neighbors) and communication (as prescribed by the aggregate program). The reusability of aggregate behaviors, modeled as functions from input to output fields, enables the paradigm to scale with complexity and to discover highlevel coordination patterns [6]-one of these, implementing clustered feedback loops, is especially convenient for edge coordination and considered for the case study in Section IV.
2) Pulverizable Architecture of Aggregate Applications: The execution and coordination protocol described in the previous section is not a rigid schema: it is flexible and may be adapted, both offline and online, to take into account various tradeoffs between performance and cost. For instance, the frequency of operation of the devices can be adjusted to closely match the rate of change of the phenomena under monitoring. Moreover, the responsibilities of a component of the aggregate (sensing, computation, actuation, and coordination) do not need to be deployed together. This notion, known as pulverization [10], enables the partitioning [26] of aggregate applications into several deployment units, which may be deployed variously on a target ICT infrastructure, leading to potentially different performance and costs [8], [10]. Fig. 1 summarizes the notion of pulverization for an aggregate computing system. In this partitioning model, the responsibilities corresponding to the activity of an individual logical device are split into the following parts ( Fig. 1 ing information with neighbors, as prescribed by the behavior. Then, an implementation would provide a different deployable component (or service) for each of these parts, and a deployment would specify which components are put onto which physical devices (Fig. 1).
In aggregate computing, the aggregate system that is programmed is, first, a logical entity, potentially decoupled from an underlying physical system. Typically, a logical node ( Fig. 1) is associated with a concrete node of a networked system (e.g., a robot, a smartphone, or a server in a cluster)its physical twin-and provides a "reasoning cycle" to turn its sensor observations to actuation instructions (cf. Fig. 1). However, certain responsibilities (e.g., computation, state, and communication) can be, in principle, offloaded to other physical devices-e.g., edge, fog, or cloud nodes (cf. Fig. 1). These responsibilities may be offloaded because of limitations of the physical twin (e.g., it is a thin host with little storage, energy, and computational capabilities) to enable neighboring relationships that are not related to physical connectivity, or in the context of a strategy to optimize the execution of the aggregate computation [8], [10] (e.g., exploiting co-location of the communication components to realize zero-latency and zero-bandwidth interaction). In the following sections, we show how pulverizable architectures can be considered in the context of a methodologysupported engineering process and evaluate performance and costs associated with different aggregate application deployments (Section IV).

B. Simulators for Collective Adaptive Systems on the Edge
Assessing how a collective adaptive application will perform before its actual deployment is paramount yet somewhat tricky: the system it applies to comprises multiple and possible heterogeneous entities; as such, most classic testing techniques do not apply straightforwardly; moreover, the correctness (or accuracy) of the system is usually measured with multiple continuous metrics, which need to be evaluated for many different situations. Moreover, many variables in reallife applications come from experience and are most often estimations. Simulation can be used robustly in these cases, as solutions can be picked based not just on their outcome with the expected value of the parameters, but considering the reliability of the behavior across a wide range of situations. Consequently, simulation has a prominent role in the overall system engineering procedure. Since even functional requirements are generally defined in terms of global behavior, simulation enters the game early, as it is the leanest way to verify if prototypes work as expected. Simulators meant to support prototyping and development usually abstract away network-level and nonapplication-level details, prioritizing scalability and debugging of the system behavior. Typical simulators frequently leveraged in the software design phase include Repast [27], NetLogo [28], and Alchemist [29], with the latter featuring first-class support for aggregate computing specifications defined in Protelis [13] or ScaFi [30].
Once the system is functionally verified, information concerning low-level details, such as networking and power consumption need to be assessed: a solution that was found to provide the system with the required capabilities may be unsustainable for the actual deployment, or show degraded performance under some circumstances not captured by the abstract simulated model. In such a case, two options are open: searching for another implementation option at the application level or finding a more efficient mapping of the application logic on the deployed system (e.g., as proposed in [31]). Systems supporting pulverization offer a principled way to tackle the issue, generalizing the second option: the functional logic is separated from its deployment details and several configuration options can thus be explored.
From both surveys, EdgeCloudSim [14] emerges as one of the richest edge-cloud simulators. EdgeCloudSim is a Javawritten application distributed with an open-source licence, based on CloudSim, and it has been exploited in dozens of works and different application scenarios for evaluating IoT applications performances, offloading strategies, resource allocation schemes, etc. The simulator models a layered architecture composed of end devices, edge, and cloud. The base simulated entity is the task, a process generated by end devices with a predetermined resource and network consumption that can be offloaded to upper layers (edge or cloud), through, respectively, local (WLAN) or wide-area networks (WAN). An orchestrator is in charge of implementing the rules and policies for handling incoming devices' tasks. EdgeCloudSim focuses on three primary performance metrics: 1) service time (by distinguishing between its two components, i.e., networking and computation time); 2) service failure rate (which can be due to networking-, mobility-or computationrelated factors); 3) and resource utilization (in terms of CPU or bandwidth overloads). Simulations are parameterized by simulation time, device count, packet size, task length, and network bandwidth through a set of declarative XML specifications, respectively, devoted to the application, device, and scenario modeling-thus relieving the end user from the need to tinker with the simulator source code.

C. Other Deployment Methodologies
Several approaches dealing with the deployment of complex systems and featured by different degrees of comprehensiveness have been proposed so far. Table I shows a qualitative comparison of the main works by specifying their scope (general or special), target (particular objectives and supported deployment), contribution (i.e., from full-fledged methodologies to dedicated strategies and tools), and evaluation environment (namely, network simulators, numerical frameworks, or real testbeds). In particular, it can be immediately noticed that they mainly focus on the allocation/placement of generic resources, tasks, services, or applications by considering a double-layer infrastructure (cloud plus edge/fog) supporting the developers with a single contribution to be evaluated through a network simulator. Conversely, the methodology we propose and its associated toolchain are flexible enough to open a larger design space (IoT-edge-fog-cloud continuum), enabling the definition of the business logic of a distributed software independent of any deployment constraint or concern, assuming that the underlying software system can be pulverized. In this sense, the methodology introduces a cleaner separation of concerns; the main advantage with respect to the traditional methods is the possibility to defer the choice of which host will ultimately execute which part of the software.  I  QUALITATIVE COMPARISON OF DEPLOYMENT APPROACHES ACCORDING  TO THEIR PURPOSE (GENERAL OR SPECIAL), GOAL, TARGET  INFRASTRUCTURE (IOT, EDGE, FOG, CLOUD), CONTRIBUTIONS  (METHODOLOGY, MODELS, (OPTIMAL) ALGORITHMS, STRATEGY, TOOL,  TOOLCHAIN), AND EVALUATION ENVIRONMENT (SIMULATOR, TESTBED,  NUMERICAL FRAMEWORK) III. METHODOLOGY This section presents our contribution, which consists of a methodology for simulation-based deployment evaluation (Section III), and an implementation of the methodology to the aggregate computing paradigm, along with a prototype toolchain providing support to its various phases (Section III-B).

A. Methodology for Simulation-Based Deployment Evaluation for Pulverizable IoT-Edge-Fog-Cloud Systems
As discussed in Section II-A2, pulverized systems provide a clean separation between the application business logic and its actual deployment. This section discusses how this feature can be leveraged in a methodology, providing insights into how combinations of different pulverization would behave on different infrastructures The input information for the method has two elements: 1) functional requirements for the application; 2) possible target infrastructures. Functional requirements must be satisfied by creating an appropriate specification in a pulverizable language or platform (in our prototype implementation, we relied on the aggregate programming language Protelis [13]), producing a partitioned application. The available potential target infrastructures need to be captured into a formal machine-readable model. This operation is similar to a "reverse" IaC [12]. In IaC, computation resources (servers, virtual machines, and their configuration) are managed and provisioned via machinereadable declarative configuration files; in our case, configuration files should be produced as descriptors of the possible infrastructural configurations. Crucially, in the case the final system's target infrastructure was instanced via IaC (which is likely for distributed systems using a modern DevOps automation pipeline), the IaC descriptor could easily play the role of infrastructure descriptor for the proposed methodology as well.
Information on the infrastructure model and the partitioned application is then provided to a deployment generator, a configurable software component that finds all possible valid deployments of pulverized components onto the possible infrastructures, generating the corresponding simulation files. Simulations are then executed, in our case, by relying on EdgeCloudSim, and performance analysis is performed. The methodology is summarized in Fig. 2.
Results can then be: 1) interpreted by the developers to gain insights on the most suitable strategy for the pulverization of the system; 2) used by the operations team to figure out which deployment allows meeting the requirements while saving resources; and 3) even integrated into the quality assurance automated pipeline. In this sense, the proposed methodology also contributes to the evolution of the best practices for the development of distributed systems by realizing a predeployment performance evaluation that could influence the IaC and deployment phases, thus allowing to block the deployment in case the system was found to have a relevant performance regression.

B. Application to Aggregate Computing
We have created a prototype implementation of the tooling required for applying the methodology [52]. In particular, we have selected the Protelis aggregate programming language as pulverizable behavior specification, and the EdgeCloudSim platform for simulating the deployed system, framed in bluefilled boxes in Fig. 2.
The deployment generator module is at the core of the approach, representing the novel element introduced in the toolchain. It works as an adapter between the high-level pulverized program specification and the network simulation tool, and has the following responsibilities. 1) Given a behavioral specification (in our case, a Protelis program), providing cost models compatible with the low-level simulator (in our case, EdgeCloudSim). 2) Given a set of possible infrastructures, filtering those compatible with the requirements of the pulverized system.

3) For all the plausible combinations between infras-
tructures and deployments of pulverized components, generating all the valid simulation configurations. 4) Analyzing all these configurations by exercising the network simulator.

1) Complexity Estimation of Pulverized Aggregate
Programs: The first step of our analysis methodology requires estimating the resources that are required to run a pulverized system. In such directions, there exist several options that capture the specification at different levels of abstraction and provide differently grained estimates.
One approach is the static analysis [53], which takes as input the source code, the binaries it produces (if compiled), or some intermediate product of compilation; builds an internal model of the program; and then performs the analysis by searching for known patterns, without actually executing the code. Usually, static analysis tools are meant to intercept style inconsistencies, dodgy code snippets, bad practices, bugged patterns, and security vulnerabilities; however, the same technique can also be used to gain insights on the complexity and, especially by data-flow analysis (namely, static prediction of the possible runtime values of some variables), on some bounds of the size featuring the exchanged network messages. In our prototype implementation, we rely on static analysis to estimate the computation load required by Protelis programs, done by intercepting the intermediate representation of the abstract syntax tree produced by the Protelis interpreter before execution, and then by estimating the execution cost of each subtree. Since Protelis is higher-order [54], we had to take into account function references and lambda expressions, hijacking the standard interpretation machinery to explore their body. The peculiar Xtext-based implementation [55] of the language has been relevant for simplifying the process and for driving the choice of Protelis as target language for the analysis. On the other hand, the lack of a static type-checker hindered our data-flow analysis, so we expect other aggregate computing implementations (such as the Scala-internal DSL ScaFi [24]) to be amenable of a more detailed analysis (although at a higher implementation cost due to the complexity of the host language).
Another approach to evaluating the expected performance of code is (micro)benchmarking, which requires instrumented execution of the software and measures (rather than estimates, as it is done by static analysis) the cost of executing software. Although this kind of measurement sounds attractive; the measures may be affected by a very significant error and the outcomes could be much less precise than expected [56]. This is due to the inherent complexity of modern computers and software stacks: CPUs are equipped with several layers of caches that heavily impact performance, e.g., even comparing different list implementations by timing their use can generate astonishing results); 1 the operating system scheduling policies introduce additional variability; compiler tuning can produce, from the same source, executables with different performance; and, finally, language runtimes, such as the Java Virtual Machine or the Common Language Runtime introduce further layers of complexity due to internal caching, garbage collection, just-in-time (de)optimization [57], and other mechanisms. Even though, in principle, (micro)benchmarking is a viable option for a complexity estimator and should probably be part of an all-round tool, the vast number of variables to keep in check made it a technique unsuitable for a prototypical demonstrator as the one we are presenting for this work; thus, we used static analysis.
As mentioned in Section II-B, usually a simulation step is required during the design to understand whether the desired behavior is being achieved. These high-level simulation tools, although often not capturing enough of the low-level details, can be leveraged to extract valuable information on the system. For instance, the Alchemist Simulator [29] does run actual aggregate code in simulations. We indirectly exploited this capability by excerpting the small portion of code in charge of executing the aggregate program and emulating the delivery of messages to other devices, configuring it manually using the Protelis networking module, and interposing a serialiser inbetween: this way, we estimated message sizes rather precisely. This is an instance of a more general practice where simulators used in the design step are leveraged for obtaining insights on the behavior of the system (in our case, the expected message size), and this information is then used in the detailed deployment evaluation.
2) Formal Deployment Specifications: As discussed in Section III-A, the proposed approach requires a formal specification of the actual target platform. The most likely reification for a full-fledged tooling implementation would likely be a translation directly made from the IaC definitions; however, there is no physical target in our prototypical proof of concept. For this reason and the sake of simplicity, we have developed our own lightweight syntax, rather than implementing an ICT infrastructure model translating existing tools' definitions into EdgeCloudSim-compatible environments descriptors. A relevant factor in our decision has been the lack, at the time of writing, of any widely accepted standard language for IaC. The arguably most widespread syntax is the custom language used by the Terraform tool [58], which would require a customized parser.
Our descriptor is thus a standard YAML 2 file capturing hardware, networking, and infrastructure parameters. An excerpt of such configuration is presented in Fig. 3. It is possible to specify multiple values for most keys: every combination of such values is then tested in simulation.
A second descriptor, exemplified in Fig. 4, defines the possible deployments of pulverized components. Omitted keys have a default target; in particular, the state component is intended to be deployed alongside the behavior component unless  otherwise specified, and sensors and actuators are intended as deployed on end devices.

3) Automated Simulation Execution and Data Analysis:
Out tool performs three tasks in order to execute all the required simulations: a generation task that compiles the input descriptors and generates a set of EdgeCloudSim configuration files, an execution task that runs all simulations, and an analysis task producing graphs. The generation task: 1) performs the Cartesian product of hardware, networking, and infrastructure parameters; 2) estimates the Protelis program complexity and related message size; 3) generates EdgeCloudSim configuration files for each combination of device count, edge server count, and (if an interval is provided rather than a number) Protelis program complexity and message size; and 4) finally produces a descriptor for the execution task with instructions on all the combinations to be executed. Once done, the execution task is in charge of launching multiple repetitions of the simulation. Finally, the analysis task executes on the produced results. The final part of the evaluation depends on the nonfunctional requirements of the application: unless a deployment configuration is strictly superior to another under all the metrics (which is pretty unlikely in real-world situations), identifying the best one requires factoring in the impact of any metric of interest. For instance, for a social real-time distributed game, latency could be vastly more important than data rate; on the contrary, if the application business logic concerns streaming prerecorded high-definition videos, the data rate is predominant over low latency. In our case, the analysis task has been used to produce the charts included in Section IV, along with many others that we have used to interpret better the system behavior that was not included for the sake of brevity (still, they are available on the same repository hosting the prototype [52]). We have written the former two tasks in Kotlin as part of the Gradle-based build automation tooling of the prototype. Instead, charts have been produced via a MATLAB script, executed via Octave through a GitHub Action.

IV. EVALUATION
This section exercises our proposed methodology and toolchain by performing a deployment analysis for a case study from the literature. We have selected an existing implementation to show that the approach applies to existing code without any impact on the specification; most specifically, we have taken an implementation of the self-organizing coordination regions pattern [6] applied to the collection of local user-generated multimedia streaming. Our goal is to demonstrate that through the proposed methodology, a predeployment analysis can be executed provided a specification of the target infrastructure and the pulverized behavior. Section IV-A reports how the experiment is configured and Section IV-B presents evaluation results.

A. Configuration
The behavior of the simulated system is obtained from the experiments presented in [6]: we have our analyzer with the Protelis source code found in [59], obtaining an estimation of the millions of instructions required on average to compute the behavior and an estimate of the message size.
We then created the infrastructure descriptor, trying to map a reasonably realistic target platform. Our reference target comprises nine edge servers with an associated Wi-Fi access point that end devices can connect to. EdgeCloudSim models wireless data rate reduction due to environmental factors and shared resources internally, provided a maximum data rate. We selected such a maximum data rate by observing results in the literature for 802.11ac Wave-1-certified [60] and Wave-2certified [61]   C, similar to B C but with the additional constraint that behavior and communication components are forced to be located on the same host. In every case, components that are deployable (see Table II for the available options) on edge and cloud have a probability P of being hosted on the cloud instead of on edge. We consider the following cases as baselines, as they could be realized traditionally, without pulverization. 1) C With P = 0: An application designed from the beginning to run on the end devices and communicate through the edge. 2) C With P = 1: An application designed from the beginning to run on the end devices and communicate through cloud-mediated messages. 6 https://archive.is/g7jN9 7 https://archive.is/M0ChF 3) B C With P = 0: An application designed from the beginning to delegate to the edge servers everything but sensing and actuation (end devices are considered thin) and communicate through the edge. 4) B C With P = 1: An application designed from the beginning to delegate to the cloud everything but sensing and actuation (end devices are considered thin) and communicate through the cloud. End devices move following a nomadic migration model: they spend some time in the proximity of an edge server, then they can migrate elsewhere. The time spent at each destination depends on its attractiveness parameter (specified in EdgeCloudSim configuration files), while all destinations have the same probability of being reached. Fig. 6. Average network delay by deployment type (columns) and P (rows). The last two columns (B Cand C) of the first (P = 0) and the last (P = 1) rows, surrounded by black boxes, can be considered baselines, as these data could be generated and studied with classic approaches too.
The whole experiment has been documented, automated, and published as opensource in a public repository 8 [52] to facilitate accessibility and reproduction. Unfortunately, EdgeCloudSim does not allow for seeding simulations: results obtained by re-executing the process will produce slightly different results.

B. Results
Simulation results are summarized in Fig. 6 for network delays and in Fig. 7 for the task failure rate. Data are the mean over seven simulation runs.
First, the proposed methodology widens the design space of the distributed application at hand: with traditional development methodologies, the application should have been designed from the start to work on either thick devices (C deployments), or with thin end devices, but with behavior and communication logics co-hosted (B C deployments); since single application parts get designed with the communication machinery in mind, either the cloud (P = 1) or the edge (P = 0) is used. The proposed methodology opens the door to many further possible deployment schemes, as all choices on how single components should communicate and where they can be deployed are delayed until after the system business logic design is complete. Analysis of the data for the several deployment schemes show no clear dominance. Figures depicting the average network delay show a peak due to an increased rate of network failures: these data are to be considered along with the probability that tasks complete successfully. We observe, in fact, that in 2-tiered scenarios, most failures are cloud side, and, by comparing failures with the corresponding relative network delays, we see that decreasing average network delays match the growth of the failure rate. In 1-tiered scenarios (edge only), we observe a minor average network delay compared to the other scenarios-as expected-but a higher percentage of failed tasks, most likely due to saturation of edge servers' computational resources. For the specific scenario (multimedia Fig. 7. Percent of failed tasks by deployment type (columns) and P (rows). The last two columns (B Cand C) of the first (P = 0) and the last (P = 1) rows, surrounded by black boxes, can be considered baselines, as these data could be generated and studied with classic approaches too. Solid-filled areas represent the quota of failed tasks that were assigned to the cloud. streaming), unless the streaming is intended for real-time reuse (e.g., augmented reality gaming), latency is likely more tolerable than failure. Data show that for a relatively low count of users (below 200) a 1-tiered architecture is a viable solution, provided that a B C deployment is performed, as B C has a more erratic behavior due to the components being migrated separately, and C deployments stress the network too much. To support larger systems, a 2-tiered architecture is necessary, and in this case a B C deployment with a low P seems to be the most scalable solution. Adding an edge orchestrator in this situation is not particularly helpful, as it provides better performance only when the system is not under stress, while in heavy load configurations, the performance of an architecture with or without orchestrator tends to converge.

V. CONCLUSION
The development of collective, dynamic, heterogeneous and scalable IoT systems in complex and uncertain scenarios is an engineering task as important as challenging: from the 1) functional viewpoint, they require programming paradigms that inherently support essential features of autonomy, decentralization and adaptiveness and 2) deployment viewpoint, due to the entanglement of different factors related to computation, networking, and mobility aspects, they can be mapped in several, alternative, and suboptimal settings, with the risk of ineffective deployments and reconfigurations. Therefore, computing paradigms for expressing edge intelligence by separating logic and deployment planes as well as simulation tools for preliminary and comprehensively evaluating different target deployment configurations are key enablers to effectively and efficiently develop complex IoT systems.
Along this research direction, in this article, we have presented a methodology and a toolchain for capturing candidate deployments of smart edge services and predicting, by simulation, their performance and cost. In particular, we tailored this approach to aggregate computing, since its architecture enables pulverization (namely, partitioning into logical components and deployment units), spread on IoT-Edge-Fog-Cloud infrastructural continuum, and accordingly simulated on EdgeCloudSim. We have then demonstrated the potential of our approach to an edge multimedia streaming case study which mirrors the challenges and requirements of complex IoT systems (large scale, dynamicity, and adaptivity) and demands edge intelligence for load-balancing operations through dynamic clustering. In particular, we showed that, starting from the same pulverized specification, depending on the target deployment of our components, we can obtain systems with different performances. Each type of deployment works better under some circumstances depending, for example, on the expected number of users and the availability of computational and network resources. Since our methodology and the related prototype tool enables an a priori estimation of the deployed system performance, they can be exploited by operation teams to select the most suitable target platform for the system deployment or can be integrated into an automated deployment pipeline as an additional quality control step.
In the future, we plan to realize a variant of the presented methodology where the simulation step is substituted with an optimal method, similar to the ones proposed in [45], [47], and [49]. Moreover, we intend to perform further evaluations about both the methodology and other use cases.