From Zero to Fog: Efficient Engineering of Fog-Based Internet of Things Applications

In IoT data processing, cloud computing alone does not suffice due to latency constraints, bandwidth limitations, and privacy concerns. By introducing intermediary nodes closer to the edge of the network that offer compute services in proximity to IoT devices, fog computing can reduce network strain and high access latency to application services. While this is the only viable approach to enable efficient IoT applications, the issue of component placement among cloud and intermediary nodes in the fog adds a new dimension to system design. State-of-the-art solutions to this issue rely on either simulation or solving a formalized assignment problem through heuristics, which are both inaccurate and fail to scale with a solution space that grows exponentially. In this paper, we present a three step process for designing practical fog-based IoT applications that uses best practices, simulation, and testbed analysis to converge towards an efficient system architecture. We then apply this process in a smart factory case study. By deploying filtered options to a physical testbed, we show that each step of our process converges towards more efficient application designs.


Introduction
For more than a decade, cloud computing has been the dominant paradigm in designing and deploying software services. However, it is not a good fit for new application domains such as the Internet of * Funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) -415899119. This work has been published in Wiley -Software: Practice and Experience vol. 51 no. 8 pp.  Things (IoT). Sending the world's IoT data to a centralized cloud for processing is both inefficient and prohibitively expensive [1]. Processing should instead happen where IoT data is generated and needed [2]. Fog computing, as first proposed by Bonomi et al. [3], supplies the necessary paradigm shift: It extends the cloud to the edge of the network so that applications can leverage additional infrastructure between the cloud and end-devices. From powerful data centers in larger cities to small, single-board computers co-located with cellular base stations, application designers can deploy their services not only in a central cloud but anywhere on the edge-cloud continuum. While cloud resources still provide elastic, seemingly infinite scalability at low cost, edge infrastructure offers service consumers low latency access while also consuming less network bandwidth [2]. Overall, fog computing enables previously impossible application architectures but at the same time makes application design more complex. When designing fogbased IoT applications, the placement of software services within the fog is now a new dimension to be considered in addition to actual building of the application.
Correctly placing services, however, is vital in leveraging fog computing for the IoT as it directly influences both the quality and cost of applications. At the same time, the number of deployment options increases exponentially with each service or location. Existing approaches to designing fog-based IoT data processing applications all have their drawbacks: First, there are those that try to parametrize the entire system to form an optimization problem solved algorithmically (e.g., [4][5][6][7][8]) or via simulation (e.g., [9][10][11]). While such approaches are highly valuable for providing insights, their accuracy is inherently limited by the assumptions of the model -in particular, such approaches cannot capture runtime characteristics of actual software components, which are highly dependent on concrete implementation choices. Information on implementation choices, however, is not be available at design time and their actual impact depends on the target hardware. Alternatively, a second approach is to follow guidelines, best practices, or reference architectures such as [12][13][14][15]. Although useful as a starting point, these generalized target scenarios are not sufficient for a specific use-case. Third, there are approaches that introduce tooling to create (emulated) fog testbeds (e.g., [16][17][18]) to deploy, test, and benchmark applications. While such experiments can provide the most accurate insights into application behavior, the high level of accompanying effort and running costs make them a poor fit for exploring thousands or millions of deployment options.
In this paper, we propose a new process for designing efficient fogbased systems that combines and extends existing approaches, namely following best practices, simulation, and testbed emulation. This combination enables us to leverage the advantages of each approach while mitigating their respective limitations. For instance, we apply best practices to reduce the parameter space for simulation, which prevents incurrence of costs for simulating the entire parameter space without sacrificing the accuracy of simulation results. Our overall goal is thus to identify an efficient fog application design as effectively as possible.
To this end, we make two core contributions: • We extend and integrate previous research of ours into a novel framework. We use best practices [12], simulation with Fog-Explorer [9,10], and infrastructure emulation with MockFog [16] (Section 3).
• We implement a smart factory application following our proposed process and compare the final application design to a range of discarded design options in experiments on a physical fog testbed (Section 4).

IoT Applications in the Fog
In this section, we summarize fog computing concepts and discuss characteristics of fog-based IoT applications and efficient IoT application design.

Characteristics and Challenges of Fog Computing
Our definition of fog computing is adapted from [2]. Fog computing is the extension of the cloud toward the edge of the network. The idea is to distribute applications across a wide variety of infrastructure including cloud resources, intermediary nodes, edge computing, and even on-device computation. In this way, application developers can leverage both low access latency at the edge and scalability in the cloud. We show an example of a layered fog architecture in Fig. 1.
Because fog computing combines platforms from different vendors (e.g., a cloud provider or a network provider), heterogeneity is a major challenge. Different platforms are likely to provide different programming models and service levels. Furthermore, intermediary and especially edge nodes are also likely to be more expensive and less scalable than their cloud counterparts.
A major obstacle to using fog computing is that applications need to be deployed in a distributed manner, with different software components placed on different nodes in the fog. This is impossible when dealing with traditional monolithic information systems. Only a modularized application split into distinct software services allows each service to be placed at specific locations within the fog, whether that be in the cloud or toward the edge. While increasing the communication overhead, smaller services are necessary for fine-granular scaling and enable more flexibility in service placement on the fog infrastructure [2]. To this end, leveraging lightweight virtualization technologies such as Docker can make software deployment easier [19].

Data Processing Paradigms in IoT Applications
IoT applications analyze data from sensors or process them to trigger actor devices and software systems [12]. A key characteristic of IoT applications is that they do not follow a request-response model as do user-facing systems. Instead, data move through a processing pipeline in a more "linear" way -typically in the form of a directed acyclic graph (DAG). Overall, there are two classes of IoT data processing: event processing and data analytics. Zhang et al. [1] describe these as "real-time applications with low-latency requirements" and "ambient data collection and analytics," respectively. An application often comprises multiple data processing components that can each be classified individually in this manner.
In event processing, events from the outside world (measured through connected devices) trigger actions in the system and, by extension, possibly in the physical world. The main focus here is time sensitivity to satisfy tight latency requirements. Advantageously, operations are thus also well-defined and simple, and events as data points are small as they carry only metadata [20].
Data analytics is the process of collecting and processing data to obtain information. Complex operations are applied to data from multiple sources over a longer period of time here [21].

Service Level and Cost in Fog Applications
We consider two dimensions of efficiency in fog-based IoT applications: service level and cost.
Service level, often also referred to as quality of service (QoS), can be both the availability of the application and the access latency for specific services [2,22]. Access latency is highly dependent on service placement and is determined by data processing and transmission.
Data processing latency describes the time that passes between the input into the processing unit (e.g., a cloud function) and the output of a computed result. Data transmission latency, on the other hand, is the delay from the first packet of data to be sent by the sender to the last packet of data to be received by the receiver. Availability of individual services is dependent on two main factors: software availability and platform availability. We abstract from software availability in this paper, as software testing is an orthogonal problem to the service placement problem we address. Platform availability depends on the availability of the network, server, or storage. Although cloud platforms also suffer from outages periodically, we consider them to have higher availability than edge or fog devices. Redundancy is managed by the cloud providers and providers are liable for any outages that violate their service level agreements (SLA).
Cost is incurred through the usage of resources in the fog such as compute, storage, and network bandwidth and through upfront investment in IoT devices or other hardware. Generally, compute and storage are far cheaper closer toward the cloud, as providers can leverage economies of scale in large public cloud data centers rather than on the edge, where only privately used devices may be available [2,23]. For network bandwidth, fog platform providers often charge for outgoing and incoming traffic to a data center and IoT devices may use cellular network access where each packet incurs a specific cost. These costs are the main contributors towards the total cost of operating an IoT application.
When designing fog-based IoT applications, different design options result in different service levels and cost. An efficient design offers the best possible QoS levels at the lowest possible cost (i.e., it finds a sweet spot in the QoS and cost tradeoffs [22,24,25], as QoS and cost are not independent of each other). Deploying powerful servers at every edge location minimizes latency but results in higher costs and lower availability. Similarly, moving all services to the cloud minimizes cost and maximizes availability but increases access transmission latency at the same time [1,2].

Designing Efficient IoT Applications
In this section, we present our proposed fog application design process. We start by giving a high-level overview of our approach before describing the individual steps in detail.

Overview of Our Five-Step Process
The process we propose for designing efficient fog-based IoT applications comprises five main steps. Initially, there is a broad range of design options, in the order of 10 6 , each of which describes a mapping of software services to nodes in the cloud, edge, or in-between (i.e., the service placement). Each step of the proposed process then filters out application design options starting with the Cartesian product of  all software and infrastructure models, thereby converging on a limited set of most efficient designs. The key idea is to create a sequence of steps in which each step provides more accurate recommendations than its predecessor but is also more expensive to execute. Since each step reduces the application design space by orders of magnitude, we use more expensive analysis steps for only a limited number of options late in the process while relying on low-cost heuristics in the first steps (see Figure 2 for a high-level overview of the proposed process).
In the first step, we build models of software components and the infrastructure on which the application will be deployed. In each later step, we then extend these models and augment them with additional details as available further in the design process. Finally, we are able to select an efficient fog application design.
In the second step, we apply a set of best practices in IoT data processing. By following these informed rules, we can discard all highly inefficient options at this stage. This reduces the set of options that we have to consider later in the process, enabling us to move through these subsequent steps more efficiently. As the number of available options grows exponentially with each additional component, this step reduces the design options considered in the subsequent steps from millions (∼ 10 6 ) to just thousands (∼ 10 3 ).
In the third step, we simulate service placement to infrastructure components. This enables us to calculate service cost based on the given cost factors and examine latency constraints for different designs. By introducing service level objectives (SLOs) for parts of the application, we can remove application design options that violate required service levels and instead focus only on inexpensive options that conform to all constraints, reducing the set of viable application design options to an order of 10 1 .
In the fourth step, we set up emulated testbeds for each of the remaining application design options to deploy and benchmark software services. As this step is expensive and time-consuming, we propose to use only the options in the 95 th percentile of the second step, again reducing the number of considered application design options by orders of magnitude. Based on the number of remaining design options, this selection may be limited or broadened, reducing testing costs or leading to more accurate results, respectively. This process eventually converges toward a small set of highly efficient design options. If available, the options that show the best performance at good cost levels can then be deployed on a physical testbed or the actual infrastructure to measure their performance in their real environment (fifth step).

Step 1: Software and Infrastructure Models
Our process requires basic insights into the available runtime infrastructure and the individual software services. These insights can be provided by domain experts. For example, system administrators may be able to provide information on available infrastructure, while application developers can identify and classify application services. We start with a notion of infrastructure components, yet at this early step in the design process we cannot assume that detailed information about runtime infrastructure is available. We therefore need only high-level, abstract descriptions of available data processing locations (such as IoT devices, edge nodes, or cloud platform providers). Such knowledge can, for example, be gained by surveying and analyzing eligible providers and products or by comparing options for IoT devices and gateways. For some more complex use cases, synthesizing possible edge infrastructure configurations as proposed by Rausch et al. [26] could be an alternative approach. Furthermore, possible infrastructure components should be selected with their availability in mind. For applications that require high availability, it is helpful to consider only infrastructure components that can provide sufficiently high platform availability.
Aside from infrastructure components, we also model software components 1 At this point, no actual implementation has to be available yet. For our model, we use three kinds of components: sources, services, and sinks. Sources are components that produce new data. For an IoT use case, sources are typically IoT sensors. Services consume data and perform operations, thereby producing new data. Services could, for instance, transform data through aggregation or trigger events. Finally, sinks are components that persist dat (e.g., a database system), or interact with the physical world based on data (e.g., an IoT actuator). Sinks that persist data can also have a secondary role as sources exposing historical data. We show an example application of this kind in Fig. 3. We define the overall application as a collection of application paths. Each application path starts with one or more data sources, has a number of services along the way and ends in one sink (i.e., an application path is the DAG of processing steps that leads to a particular sink). Again, these application paths can easily be identified by the application developers as they reflect the application's business logic. At this point, albeit early in the process, it is already useful to simplify both software and infrastructure models. In most IoT applications, specific components are instances of the same class of components. In a smart home use case, for example, there could be various light bulbs and corresponding light switches. Assuming that each switch controls a number of lights, a pattern emerges. To simplify simulation and benchmarking, we model only one application path and later apply this to all instances of light switches and lights in the system. This also allows our process to scale well and to require less upfront information about the system, while not influencing the results as we merge instances of the same component rather than modifying them. For sources and sinks, the mapping to infrastructure components is clear, as these are tied to the physical world. An IoT device, for instance, exists as a physical device (i.e., an infrastructure component) and as a source in the software model. Consequently, we only need to consider the placement of services (i.e., the software components that process data) in the subsequent steps of the design process.

Step 2: Applying Best Practices
In previous work [12], we proposed best practices for fog-based IoT application design, which we now use to exclude unsuitable application design options. In the following, we will briefly describe how we apply these best practices, which we split into rules for event processing and data analytics application paths.
In event processing, processing is time-sensitive and services should be placed on the shortest communication path between data source(s) and sink, as close to the cloud as possible to minimize cost, and as close to the edge as necessary to fulfill SLOs. As processing a single event is not compute-intensive, minimizing round-trip time is more important than reducing processing delay, especially for reactive and real-time systems [27]. However, as cloud computing resources scale better and moving toward the edge reduces flexibility and increases cost, it is still important to process events as close to the cloud as possible. That means selecting the infrastructure node that provides the most flexibility and least expensive compute power from the set of nodes on the shortest path between the event source and its sink.
In data analytics, however, time sensitivity is not the main priority. Operations are complex and require a lot of processing power. These operations range from filtering or aggregation to predictive analytics with machine learning. Furthermore, services here must consider and even combine data from different sources. This also includes complex event processing, where events from thousands of different sources, such as IoT sensors, are analyzed in an aggregated manner. On the cloud-edge continuum, data analytics processors that preprocess data should be kept as close to the edge as possible to reduce data volume on the network, yet also as close to the cloud as necessary given their computational complexity. Compute-heavy operators, on the other hand, should be placed near the cloud, where processing is cheaper.
Given these best practices, we can filter the set of application design options. When filtering, we consider each option individually. First, we identify whether each application path targets event processing or data analytics. For an event processing application path, infrastructure nodes located on the shortest path between the infrastructure components that host the event source and sink are an efficient location for software services. In data processing, we favor preprocessing of data close to the edge where possible. This reduces usage of bandwidth toward the cloud, where we propose to place more complex data processing. We also rule out options where the resulting data flow uses the same network links more than twice.

Step 3: Simulation
In the third step, we use simulation to analyze the remaining application design options. The simulation is conducted using FogExplorer [9,10], which we presented in previous work. FogExplorer can be used in an interactive way, with application designers able to update mappings and observe the resulting metric values instantly. Alternatively, FogExplorer can also be used in a batch mode through its API.
Based on an infrastructure and software model, FogExplorer calculates four metrics per mapping: processing cost, processing time, transmission cost, and transmission time. Processing cost and transmission cost describe the average cost per second within the system. Processing time and transmission time describe latency induced by services and transmission of data.
To calculate these metrics, FogExplorer first determines the data stream routing by identifying the path with the lowest total bandwidth cost for each set of two communicating software components. In a second step, FogExplorer calculates resource usage to assert that the selected mapping does not exceed resource limits. For example, a connection may have a limited amount of bandwidth. In this case, FogExplorer will determine if the bandwidth required by any connection within the mapping exceeds the available bandwidth. In the third step, FogExplorer calculates total cost based on resource usage. Transmission costs depend on bandwidth used and the respective bandwidth price. In a similar manner, FogExplorer calculates processing costs. In addition, FogExplorer also determines time metrics and calculates processing time and transmission time for each application path. Processing time is the total latency induced by services processing data, while transmission time is the total connection latency along the application path. Finally, FogExplorer tallies transmission costs and processing costs, as well as transmission times and processing times to project the total cost and end-to-end latency of the given mapping.
We use FogExplorer to further filter out application design options as the third step of our proposed process. To use this simulation, we have to extend our software and infrastructure models slightly.
In the infrastructure model, we also specify different hardware options that are available for each node. At an edge data center location, for instance, it may be possible to install different types of servers with different capabilities as well as different price points. Here, Fog-Explorer allows us to compare these different options to find the most efficient one. Although this increases the space of application design options, it is necessary to determine the optimal infrastructure. For each infrastructure option at each node, we specify a relativePerfor-manceIndicator, which is a rough estimate of compute power compared to a chosen reference machine. For instance, if a machine type has a performance indicator of 2, it is twice as "fast" as the reference machine. Furthermore, the availableMemory metric specifies how much memory is available for the machine and the price metric specifies the price for using the machine. Network components are extended with an availableBandwidth, a bandwidthPrice, and a latency for each connection. If latency cannot be accurately benchmarked ahead of time, it is also possible to use estimates based on link layer performance and geographical locations of nodes as done in [28], for example.
Similarly, we add quantitative attributes to software model components as well. Sources produce data at a constant rate that we mark as their average outputRate in the form of byte/s. The rate at which services output data depends on their input rate, hence we use an out-putRatio to calculate their outgoing bandwidth. For services, we also employ a referenceProcessingDelay factor that describes how long, on average, the service needs to process data on the aforementioned reference machine, and a requiredMemory metric to describe the amount of memory needed by the service. Of course, both sinks and sources as software components require a certain amount of memory as well once they are running on an infrastructure node. The infrastructure nodes then incur costs for running these components. As we have described, however, mapping for sources and sinks is fixed, as these components relate to objects in the physical world. Accordingly, while it is possible  to simulate costs incurred here as well, these costs would be static and, subsequently, not influence our decision on one application design option versus another. This is why we omit them in the simulation and focus exclusively on resources required by service components. The extended version of our example software model from Section 3 is depicted in Fig. 4. We also introduce SLOs in the form of limits to end-to-end latency for each application path at this point. As we have described in Section 2.3, we measure efficiency for fog application design in cost and latency. Yet because cost and latency depend on each other, finding the most efficient application design is a difficult multi-objective optimization problem. Rather than finding the quantitatively optimal solution, we apply constraints in the form of SLOs to convert this problem into single-objective optimization problem 2 . While it depends on the specific application, the economic law of diminishing returns usually also applies to the tradeoff between cost and latency [29]. For example, imagine both a user-facing web service and a machine-to-machine communication use case. In the first use case, investing a considerable sum to decrease latency by 10ms would often not be useful, but it could be in the second scenario. Application designers can set the required access latency for all application paths arbitrarily high or low as is required by the application. The actual limits depend entirely on the business logic and required safety or performance objectives and our process will optimize cost within these specified service levels.
Given these limits on end-to-end latency, we only further consider those models that satisfy these constraints in an efficient way (i.e., at low cost). From the set of application design options, we select only those that do not violate the service levels for any application path as defined in Section 3.2. If no model conforms to these constraints, it is useful to reconsider the constraints or available infrastructure. From the remaining design options, we now select those that we will consider in the testbed step in light of the remaining influence factor (i.e., total cost). As testbed evaluation is expensive and time-consuming, the number of application design options that will be benchmarked needs to be low. On the other hand, the design options that are identified as good options in the simulation step are not necessarily the best options (i.e., it can be beneficial to proceed with a broader variety of options). We propose to solve this tradeoff by proceeding with design options that lie in the 95 th percentile when considering their total cost (i.e., the top 5% design options that have the lowest total cost). If necessary, this range can be adapted.

Step 4: Experiments on an Emulated Testbed
In the fourth step of our process, we evaluate design options using experiments on an emulated fog testbed. This evaluation requires an implementation of the application software that we can deploy to the testbed and is thus the most time consuming and costly. However, the low number of viable options that remain after the first three steps of our process limits the required experiments. Furthermore, it also limits the implementation effort required as services only need to be implemented for the platforms they can be deployed on in the remaining application design options [19,30].
To benchmark fog application design options, we propose using MockFog as we presented in [16]. MockFog provides an emulated yet realistic environment for functional testing and benchmarking of fog applications in the cloud. In MockFog, cloud, edge, and intermediate nodes as well as IoT devices are instantiated as cloud virtual machines. Compute power, memory, and intra-node network characteristics such as latency or failures rates can be configured. Failure scenarios can be emulated as well.
Once again, we need to modify our initial software and infrastructure models to fit the model used by MockFog. Instead of a performance indicator as given for machines in the infrastructure model, we now need to quantify the actual compute power, memory, and storage capabilities. Furthermore, we have to define bandwidth and latency parameters for network connections. MockFog introduces routers between connected machines rather than direct connections. Hence, in order for all nodes to be able to communicate, we have to add these routing components where applicable.
Rather than extending the application model, we need to replace it with actual implementations of service and sink components that we then deploy on the MockFog testbed. For source components, the majority of which are IoT devices, implementation is more difficult. These source components need to produce IoT data in conformance with the application model. It is possible to use traces of real IoT data (e.g., through BenchFoundry [31]) or to attach real world IoT devices, although this requires consideration of network conditions be-tween these devices and the MockFog testbed location. Finally, we can also employ artificial workload generators such as Apache JMeter 3 to generate data.
On the emulated MockFog testbed we can then analyze the behavior of the IoT application, especially in the context of component placement. While the MockFog environment also allows us to change configuration parameters at runtime (e.g., to inject failures), we use it only to benchmark application designs under the assumption that the provided application implementation is correct.

Step 5: Selection and Deployment
After these four steps, an informed decision on the best design option can be made. The selected design option is likely to be the most efficient one regarding cost and service level, as it has been selected through best practices and simulation as well as verified on an emulated fog testbed. If in doubt, the best two or three options can then also be test-deployed in the real runtime environment or on a physical testbed to further substantiate the results.

Evaluation
To evaluate our approach, we use a case study based on a smart factory scenario. In the first part, we follow the process described in Section 3 to show that it can be used in practice. We make all software we use available as open source 4 .
In the second part (Section 4.2), we show that the design option identified by our process is among the best options. For this purpose, we implement the design on a physical testbed and compare it to alternative design options. Due to the number of permutations and the resulting experiment effort, it is not feasible to show that the identified option is the best option. We therefore rely on sampling and run experiments with randomly selected design options that we discarded in earlier process steps.

Case Study
In our case study, we apply our proposed process to a smart factory IoT application. We start by describing the scenario and deriving software and infrastructure models (Section 4.1.1), applying our set of best practices (Section 4.1.2), using simulation (Section 4.1.3) and testbed experiments with the implemented software services (Section 4.1.4) to identify good design options, and briefly discussing the results of our approach (Section 4.1.5). This shows that it is indeed possible to pursue our proposed process and to pick a resulting design option.

Smart Factory IoT Application
We provide an overview of our IoT application's components in Figure 5. The factory comprises a factory floor, a small data center, and a logistics office. In addition to the factory, there is a central office in an offsite location.
The factory floor has two machines: the Production Machine produces a part that the Packaging Machine then prepares for shipment. To ensure that the Packaging Machine processes only faultless parts, the Production Machine has an attached camera that takes a picture of each produced part and checks for defects. The Packaging Machine should adapt its speed to the output rate of the preceding machine. Furthermore, the Packaging Machine can only operate within a fixed ambient temperature range and thus has a temperature sensor installed that will shut it off if necessary. Each machine is also equipped with a controller that controls the speed at which the machine operates. These controllers are able to communicate over a common wireless gateway. In the onsite logistics office, logistics personnel decide when to arrange outgoing product shipments. To this end, a logistics dashboard predicts machine output based on recent productivity. The factory data center provides some compute power and a connection to the WAN. In the central company office in an offsite location, the business requires central reporting of factory productivity. This central office also has a collocated medium-size datacenter. Additionally, it is possible to leverage cloud computing to outsource some computational tasks. We  Figure 6: Data sources, services, and sinks in our application. We mark application paths A1-A4 for the components.
use this information to create our infrastructure model with the cloud, data centers in the smart factory and central office as well as wireless gateway, machine controllers, and sensor nodes that all have additional compute capabilities. We also derive the following application paths from the initial concept (see Figure 6 for the software model):

A1:
The Camera takes pictures of parts leaving the first machine and the Check for Defects service analyzes each picture for defects. In case of a defect, the service instructs the Production Controller to discard the respective part.

A2:
The Production Controller has information on the output rate of the machine that produces parts and uses this information to adapt the packaging rate of the packaging machine through an intermediary service. As a second input, the Packaging Controller also relies on data from the Temperature Sensor to control the packaging rate. When temperature readings leave a specified range, as detected by the Adapt Machine service, the Packaging Controller instructs the packaging machine to pause operation.

A3:
The Packaging Controller provides data on the rate and amount of packaged parts to the Predict Pickup service that feeds into the Logistics Team Prognosis. A4: Data from the Packaging Controller is also consumed by a service that aggregates and filters that data to generate a dashboard for the central office, which then runs inside a browser on a machine in the central office. Data sources and sinks closely mirror the real world and placement for them is straightforward. For example, the Camera component in both the infrastructure and software models is the same device as in the real world. For services, however, we still need to find an effi-cient mapping. To this end, we now follow the process introduced in Section 3.

Applying Best Practices
As described in Section 3.3, we need to consider all application paths individually in this step. We begin by classifying each application path and then use the corresponding best practice advice to filter out some application design options.
A1: Although a photo is larger than a sensor value, we classify A1 as event processing. Each photo corresponds to an event in the physical world, in this case the production of a part. The Camera translates this event into a message carrying metadata in the form of an image. Processing the image is also time-critical as the Production Machine needs to discard any faulty parts before they arrive at the Packaging Machine. Although the event message has a relatively large size, the Check for Defects service on this application path only needs to consider one source at a time, which, depending on the complexity of analysis for each event, limits processing time. As such, limited bandwidth and high network latency can be bigger factors in not achieving QoS goals here. Therefore, image processing should at least be kept on factory premises, or even inside the machine on either the Camera or Production Controller. A more specific decision is not possible as long as more detailed information about service complexity and infrastructure capabilities is not available at this stage.

A2:
We can make a similar argument for A2. Here, two event sources produce events independently but a single service that controls the packaging rate consumes all of them. Again, we classify this path as event processing as events are small in size and decisions need to be made quickly. Although consuming two data sources, service complexity is also low as the service does not consider historic data and performs simple calculations. Thus, placing the Adapt Machine service on factory premises, close to data sources and sinks, is the most efficient option.
A3: Despite using only one data source producing rather simple data items, we classify A3 as data analytics since it needs to consider current and historical data. In addition, the processing is more complex as the goal is to predict future packaging rates. Furthermore, QoS limits for latency are in the range of seconds (rather than milliseconds) as the staff will only periodically check the report. Consequently, depending on prediction complexity, we propose placing the Predict Pickup service where compute power is the cheapest, in the cloud or a data center for instance. Correct placement then comes down to a cost calculation between bandwidth price and compute costs, and is part of the subsequent simulation.
A4: Finally there is A4 which monitors the factory output rate to feed data into a dashboard in the central office. This is also a data analytics workflow and there are no strict latency constraints. Instead, data amount and processing complexity are again the limiting factors.
Consequently, as the Aggregate service is a preprocessing step, placing it close to the Packaging Controller limits bandwidth usage. Similar to A3, we can then place the complex processing service Generate Dashboard where processing is available for the lowest price, which is likely to be the cloud or one of the data centers.
Starting with five services that we can deploy to one of eight infrastructure components each, there are thus 32,768 permutations, and that number grows exponentially with additional services or infrastructure components. By following our best practices, we managed to reduce the set of options to only 864.

Application Simulation
We now use FogExplorer to simulate QoS and the cost of the remaining application design options as explained in Section 3.4. To use FogExplorer, we first need to extend the software and infrastructure models shown in Figures 7 and 8, respectively. To give an example, in the application model the camera produces data in the form of images at a rate of 100kb/s, and the subsequent service takes an estimated 20ms to process data items on a reference machine with an outpu-tRatio of 0.1, meaning that with a 100kb/s input it outputs 10kb/s. Furthermore, this service requires 250MB of memory. For each application path we also introduced QoS requirements in the form of latency limits. In the simulation, we discard any service mapping that violates either of these conditions. For the A1 application path, for instance, we set an upper limit of 50ms as the delay between taking a picture and the command reaching the production controller. In the infrastructure model, we introduce different machine options with different capabilities and price points for some nodes. For example, there are two options for the camera component: One has computational capabilities of 0.1% that of the reference machine with 1MB of memory at a price of $0.5/month while the other has 5% of the performance of the reference machine with 10MB of memory available at a higher price of $5/month. As our case study is fictional, we estimate these prices in lieu of actual infrastructure. As a basis, we use pricing for a moderate compute instance with a 2-core processor and 4GB of memory on Amazon Web Services (AWS) Lightsail 5 , which costs $20/month. This is similar in price and performance to the medium machine option for the Factory Data Center node. We estimate total cost of ownership per performance to be lower near the cloud and with more powerful machine options, but higher near the edge where maintenance is a greater factor, and extrapolate accordingly. The A2 application path has two sources and, depending on its placement, these sources have a different connection latency to their common service. As both sources send their data in parallel, we consider the maximum end-to-end latency for this application path and assert that this does not violate the QoS.  Figure 8: We extend infrastructure components and their network links with more attributes as required by FogExplorer: each node has a relativePerfor-manceIndicator, availableMemory, and MemoryPrice. Network connections have a latency, availableBandwidth, and a bandwidthPrice. Square brackets denote that more than one hardware option is available at a specific node. These hardware options differ in price and capability.
We automate the simulation using the Node.js interface of Fog-Explorer. Although the number of possible application design options grows exponentially with software and infrastructure components and machine options for nodes, the preceding step in which we discarded options using best practices already limited those options, allowing us to simulate all remaining design options efficiently. In fact, with current software and infrastructure models we need to consider only 186,624 different options and are able to simulate and calculate metrics for all of them in about one minute on a standard laptop computer. For comparison purposes, and to emphasize the importance of the first step of our process, there is a total of 7,077,888 application design options and a complete simulation of those takes 50 minutes just for this simple use case. As such, using only simulation without applying best practices first is infeasible especially for more complex application scenarios.
In addition to overall cost and time metrics, we also calculate metrics for each application path on its own. This helps us discard options that violate SLO limits. From 186,624 possible application design options only 2520 are valid and only 215 remain after applying the latency limits we defined. Consequently, FogExplorer lets us discard the 99.9% of application design options that are impossible to deploy in practice given infrastructure and SLO constraints.
The options that remain are therefore those that conform to all infrastructure and QoS constraints and we can now choose those that have the lowest overall cost according to the simulation. We select the application designs in the 95 th percentile in the pool of options based on cost, a total of ten designs. From the simulation, it is clear that placing the Check for Defects service of the A1 application path in the Factory Data Center, the Adapt service of the A2 path on the Packaging Controller or the Factory Data Center, and the Aggregate service of path A4 on the Wireless Gateway are the most efficient application design options. Furthermore, it becomes apparent that the Camera, Production Controller, and Sensor do not require additional compute capabilities as they do not need to run any data processing services. For the Factory Data Center, the simulation recommends the medium machine option for each application design options and the least expensive options for both Office Data Center and Cloud.

Emulated Testbed
Based on the simulation, we chose the ten most efficient application designs and can now deploy these on an emulated MockFog testbed. Before deployment can begin, we must first implement our software components. To this end, we implement each source, service, and sink in Go 1.14. We then install the compiled binaries on the MockFog nodes as Docker containers. We use an extended version of MockFog for our experiments that is available with all other software artifacts. Each node in the system maps to one instance on AWS Elastic Compute Cloud (EC2) 6 in the same availability zone of the us-east-1 region. To emulate different kinds of hardware, we use different instance types. We show the mapping from referencePerformanceIndicator as employed in FogExplorer to EC2 instance types in Table 1. For instances of the t2 family, we enable unlimited accrual of CPU credits to prevent inconsistent CPU bursting. Given the limited number of available instance types, however, this is not as fine-grained as the referencePerformanceIndicator in FogExplorer. It is also not possible to set the availableMemory to the same value as in the FogExplorer infrastructure model. To validate performance differences between instance types we use the sysbench CPU benchmark in version 1.0. 20. 7 This benchmark calculates all prime numbers up to a certain limit, which we set at 1,000,000, in 1,024 threads simultaneously. It then reports a CPU speed metric that describes the number of events the benchmarked CPU was able to handle per second, with each event corresponding to one completed prime computation. We repeat this benchmark three times and report median results. As shown in Table 1, this metric scales nearly linearly with the amount of CPU cores. Note that in order to leverage this performance for our application, the services we deploy have to actually use all available CPU cores. To this end, our implemented application services use multithreading through goroutines. Nevertheless, we can expect that performance does not scale strictly linearly with the number of CPU cores in practice.
MockFog sets artificial network bandwidth and latency limits between machines and deploys our software components to the machines. The mappings for sinks and sources are identical each time, for instance with the Camera process running on the Camera node. Service mappings follow the ten most efficient design options identified through simulation. For each option, MockFog runs the application for 20 min- utes and then collects logs to determine end-to-end latency for each application path. We repeat this process three times to gain accurate results and use median results in further analysis. We measure end-to-end latency by attaching timestamps and unique identifiers to each request that passes through the system. Each component logs when it sends or receives a request with a specific identifier. One problem with measuring end-to-end latency in this manner is clock skew. When the clocks of two machines are not in sync, the measurement can become inaccurate. To limit this effect, all machines synchronize their clocks through the AWS Time Sync Service in their region before the experiments run. This resulted in clock deviations of under 0.3ms during our experiments.
Between re-runs of the same experiment setup, we see a small overall coefficient of variation of between 0% and 3%. Consequently, we can say that our experiment results are robust. We use the average end-toend latency unless stated otherwise and show these results in Figure 9. As expected, latency for the A1 application path is similar across all design options, as the Check for Defects service is always deployed to the same kind of Factory Data Center. On the A2 application path, we observe an end-to-end latency of between 3ms and 4ms when the Adapt service runs on the Packaging Controller and 14ms when placed on the Factory Data Center, due to the increase in network latency caused by additional hops for each request. This difference is even greater when considering only the Sensor source, where end-to-end latency is under a millisecond when the Adapt service is deployed on the Packaging Controller. For the A3 application path, processing latency of the Predict service is higher when it runs on the Factory Data Center, with an average latency of 89ms for application design option 1, and even higher for options 2 and 9, where the Check for Defects, Adapt Machine, and Predict service are all deployed on this node, with 123ms and 108ms, respectively. When the Predict service runs on the Office Data Center or Cloud, this processing latency is lower, between 67ms and 77ms. For placement on the Cloud node, this reduction of processing latency is offset by a considerable increase in network latency to 257ms. The Aggregate service of application path A4 has a processing latency of between 0.1ms and 0.15ms, regardless of the machine type of the Wireless Gateway, to which this service is always deployed. At this scale, this difference could also be attributed to measurement error. The Generate Dashboard service has a lower processing latency when deployed to the Cloud at 89ms to 90ms than when deployed to the Factory Data Center, where processing latency ranges from 95ms up to 109ms. Yet again this difference is offset by transmission latency, which, is lower here at 23ms as compared to 243ms.
As already ensured through simulation with FogExplorer, all application design options we benchmarked on the MockFog testbed comply with all SLOs defined for the application paths.  Figure 10: Service mapping and infrastructure option in the best application design option as determined in our case study

Determining the Final Application Design
Using the results from our MockFog experiments, we can now discard more application design options. Of the ten application design options we deployed to the emulated testbed, option 5 is the most efficient. We show service mapping and determined infrastructure options in Figure 10. Here, the Factory Data Center hosts the Check for Defects and Generate Dashboard services, the Adapt Machine service is placed on the Packaging Controller, the Predict Pickup service on the Office Data Center, and the Wireless Gateway is used for the Aggregate service. As infrastructure options, we use the smallest available machines for the Wireless Gateway and Office Data Center, and the medium option for the Factory Data Center. In this application design option, the Cloud is not used to host any services, hence we do not require a machine there. Here, we skip the optional deployment of several options on a physical fog testbed as described in Section 3.6 since we will do exactly that in our evaluation of result quality in Section 4.2.

Result Evaluation
After having shown the applicability of our process through a case study, we now evaluate it by deploying our resulting architecture on a physical testbed. We benchmark our application with a synthetic workload and determine whether our process has really converged toward the most efficient design by comparing it to application design options that were discarded in earlier steps of the process. In Table 2 we show application design options and the step in which we filtered them out. This figure also shows the final application design that our process determined to be the most efficient. The final design has passed the check for best practices, simulation with FogExplorer, and benchmarking on the emulated MockFog testbed. We now further evaluate this design by comparing it to other design options that Table 2: Overview of placement options and the step in which the option was discarded. This shows that early process steps alone cannot provide good enough recommendations.  Figure 11: Latency results for experiments on the physical testbed. We show average end-to-end latency measured for all application design options for each application path. Error bars show the standard deviation. † Application design options B1 and B2 were unable to run the Predict Pickup service as the infrastructure component would run out of memory, hence no results for the A3 application path can be shown here.
we filtered out during the process. Obviously, we cannot compare all possible design options. For each filter we applied, we randomly chose three of the discarded design options, deployed them on a physical testbed and benchmarked them. M1-3, F1-3, and B1-3 denote the three designs that were filtered out by MockFog, FogExplorer, and the application of best practices, respectively. For sake of comparison, we also deploy and benchmark our final, winning design as presented in Section 4.1.5, which we denote as W. Software components use the same implementation and deployment method (i.e., Docker containers) as in our emulated MockFog testbed. Our testbed comprises two Raspberry Pi 3B+ single-board computers, one acting as Camera and Production Controller, and the other as Sensor and Packaging Controller. These boards connect over 2.4GHz WiFi to a MacBook Pro with an Intel Core 2 Duo processor that we use as our Wireless Gateway. This computer, in turn, connects to a LAN over Gigabit Ethernet. This network has a 50Mbit/s Internet uplink and a ThinkPad x220 laptop with an Intel Core i5 processor that acts as the Factory Data Center connected to it. Finally, as our Office Data Center, we use a virtual machine instance on AWS EC2 in the eu-west-1 Ireland region. As the Cloud instance, we use an AWS EC2 virtual machine instance in the ap-northeast-2 Tokyo region. The respective instance types depend on the machine type used in the selected application design, see Table 1. Experiments run for 20 minutes after an initial startup time of 5 minutes and are repeated three times. We report the results of the median run. Variance across runs with the same experiment setup was between 1% and 4% for all experiments except for setup M3 (9%) where one outlier had a higher end-to-end latency for the A3 application path, and experiments B1 (15%) and B3 (6%) that were unable to complete correctly. Figure 11 shows the average transmission and processing times measured in our experiments. Experiments for application design options B1 and B2 were unable to complete as the Predict Pickup service ran out of memory on the Packaging Controller and Wireless Gateway, respectively, where it was deployed with these design options. The B3 option, while able to run all services, leads to a higher latency than others that were selected with the first step of our process. Design option F1 was determined by FogExplorer to comply with all SLOs, yet was not in the 95 th percentile cost-wise and was hence discarded. Nevertheless, latency measurements appear to be on par with designs W and M1 through M3. Option F2 violates SLO requirements in the simulation and we observe that it is also less efficient than others we tested, so this elimination was correct. Finally, while FogExplorer discards F3 for insufficient resources, as the Wireless Gateway component here has too little available memory for the Check for Defects service, we were able to deploy it correctly on our physical testbed and latency is similar to our winning design option W. Yet this deployment is more costly than W as it uses more expensive infrastructure components. For options W and M1 through M3, we see results as in MockFog where we tested these design options already. Consequently, design option W again is the most efficient option among those.

Discussion and Limitations
The five-step design process we propose can help to address the challenge of designing efficient fog-based IoT applications. Yet as with all tools, it is important to know its limits to employ it correctly. First and foremost, our proposed process targets static applications. Although not all information about the system is necessarily required upfront and infrastructure and software models are extended and modified along the way, as we have described, our design process is not equipped to deal with dynamic deployment changes such as would be necessary for physically moving sources, sinks, or compute nodes. For example, in order to augment the application with a new service, parts of the process would need to be re-run from the start. While simulation and testbed emulation can be automated, best practices would need to be applied by an actual application design engineer.
While networks with mobile nodes, frequent outages, or regular changes in topology may exist, we envision that static applications such as the smart factory in our case study are common. Furthermore, our process can be used for the static components of a more dynamic application while the dynamic components are deployed using other approaches such as [32].
Additionally, we want to emphasize again that our process is an offline approach, i.e., it is detached from the actual deployed application. Conversely, an online approach to designing fog-based IoT applications would interact with the deployed application and infrastructure to collect metrics or logs and could then move application components around, possibly even making modifications to the infrastructure. An online approach has the benefit that it requires less upfront research and investment and that it can also support dynamically changing applications to some extent. Nevertheless, we find that an offline approach has some key advantages: First, it does not interfere with the production application, as no additional monitoring or orchestration components degrade application QoS. Second, it facilitates infrastructure planning alongside application development. Rather than relying on on-demand fog infrastructure, which might not be easily available, our offline process aids in determining the optimal infrastructure components and their sizing. And third, only a process with a human in the loop can benefit from domain knowledge that is not easily quantifiable. Application developers have the option to step in and adjust the outputs of each step of the process as they see fit, which also helps provide more understandable results that can be considered. The human involvement during the individual steps depend largely on the selected tooling. For instance, MockFog experiments can be conducted in a completely automated fashion.
Another challenge is the number of factors at play in fog application design. We quantify the features of application and infrastructure Table 3: Ease and accuracy trade-off in state-of-the-art approaches to fog application design. To quantify ease of use, we extrapolate the time for a complete investigation of all application design options from our experience with our case study. We also leverage that experience to quantify an estimated reduction of the total solution space as a metric for the accuracy of different approaches. By combining different approaches into one process, we can increase result accuracy without sacrificing ease of use. components. Availability, performance, network latency, or available network bandwidth may be subject to external influence factors. however. For example, sharing a network connection with a different tenant in a cloud data center, an application service executing slower for certain inputs, energy consumption, or job backlogs through failing components can all influence availability, latency, or cost as well.

Ease of Use
Abstracting from such factors in our models means that our simulation and testbed experiments cannot accurately reflect results that we would observe in the real world. We argue, however, that we need this abstraction to keep models and simulation simple, which is in turn necessary to facilitate their use in such a design process. These factors can then be tested later in the process using physical testbeds. In Section 3.4, we introduced SLOs for application paths as a way to convert the multi-objective optimization of cost and service latency for each path into a single-objective optimization of cost within the specified latency constraints. While reducing end-to-end latency is always better, we argue that additional investment can lead to diminishing returns after a certain point. Finding these fixed constraints, however, can be difficult for system designers and setting SLOs too low or too high can have negative impacts on the overall satisfaction with the final application design by unnecessarily increasing cost or latency, respectively. In future work, we want to further explore this relationship between cost and utility of reduced latency so that this decision can be made on a more informed basis.

Related Work
We have described how the correct placement of IoT application components in the fog is difficult yet crucial for an efficient use of resources. This is a known research problem and has been discussed in existing publications. Below we provide an overview of existing approaches for fog application design and indicate how they compare to our approach in Table 3.
Brogi et al. [37] present FogTorch, which models fog infrastructure by parameterizing available fog nodes, communication links, end devices, application components, and QoS constraints, and then finds eligible deployments of application components. While this approach leads to a set of valid application deployment options, solving fog application deployment in this manner is NP-hard, as the authors show. Consequently, finding valid deployment options becomes exponentially harder with each added component and is infeasible for larger deployments. Tong et al. [38] and, to some extent Heintz et al. [36] take a similar approach to FogTorch, while [4-7, 35, 39-55] employ a more efficient heuristics approach to solve the formalized optimization problem. Naas et al. [56] offer a heuristics-based solution as well, but place data replicas rather than services, a challenge that is also addressed in [57,58]. Formulating such assignment problems requires complete information about the system upfront, including infrastructure and software implementation details. This may be available for existing applications that are moved to a fog infrastructure, yet allows little room for flexibility. For agile development of new applications, however, these details are only slowly emerging. Inarguably, these approaches find optimal solutions in static analysis, yet we propose that benchmarks on emulated or physical testbeds are necessary to verify that calculated results hold up in a real deployment.
Khare et al. [59] also employ heuristics to create an efficient application design for distributed, edge-based stream processing. Additionally, they also employ them in a multi-step process, where a DAG of the entire application is first split into a set of linear chains for which latency is estimated individually, similar to the application paths we introduced in Section 3.2. The authors here, however, approximate these processing chains algorithmically, which is an interesting alternative approach as it leads to less overhead for application designers, albeit by sacrificing accuracy.
Fogernetes as proposed in [60] automates the deployment of software services across a number of fog nodes by leveraging the Kubernetes orchestration, as Santos et al. [61] have also proposed. Similarly, [30,62,63] have also presented such dynamic middleware. While these systems are flexible, they can only optimize latency and do not take system costs into account. Rather, they assume that a specific set of infrastructure already exists along with a mapping that does not lead to under-provisioning. In our proposed process, we provision only infrastructure that is really needed, keeping overall cost to a minimum. We argue that a more efficient fog application can be designed by building the underlying infrastructure in parallel. Furthermore, the infrastructure is often not yet fixed at the start of the development process.
To this end, Roy et al. [64] present MAQ-PRO, a process for infrastructure capacity planning for component-based applications that is similar to our proposed process. MAQ-PRO begins with a profile of components, analysis of the application scenario (cf. Section 3.2), and a base performance model (cf. Section 3.4), and it also considers SLA bounds and workloads. Their approach, however, is unsuitable for the novel paradigm of fog computing as it does not consider network distance between infrastructure components, which is crucial in the fog.
In Section 3.4 we propose using FogExplorer to simulate fog placement. Alternatively, Gupta et al. [11] have proposed the iFogSim tool to model and simulate the use of fog application resources. Their tool, however, has constraints in that it only allows tree-shaped infrastructure models, which is not representative of most fog infrastructure that can contain cycles, such as in our case study. Furthermore, their tool requires highly detailed application traces, which are not feasible this early in the design phase. A further alternative is IoTSim-Edge [65], a simulation framework for IoT application in fog environments based on CloudSim. As our process only defines abstract steps, both simulation tools could be used in our process instead of FogExplorer if the user so wishes.
In [66], Brambilla et al. present an approach for simulating large scale sensor networks for the IoT. While useful in its own right, it lacks an estimation of system cost. We target more heterogeneous fog networks, albeit at a lower scale. Additionally, [34,[67][68][69][70][71] also present simulation tools that could be applied to fog computing.
We also propose using MockFog as an emulated testbed for different application designs in Section 3.5. Besides MockFog, other application testbeds exist as well. Eisele et al. [33] propose a hardware-in-the-loop simulation that uses a simulation tool in conjunction with a physical testbed. This allows them to leverage flexibility in workload generation from the simulation tool but a realistic environment from the physical testbed. However, it also leads to increased cost without being entirely accurate. The D-Cloud [72] software testing framework allows individual software components to be placed on different virtual machines to emulate a cloud environment. This tool, however, cannot be applied to a fog infrastructure. Furthermore, Coutinho et al. [18], and Mayer et al. [17] propose Fogbed and EmuFog, which use the network simulators Mininet and Maxinet [73], to test distributed fog applications. Yet unlike MockFog, these testbeds can only simulate realistic network conditions, not the constrained compute capabilities of fog nodes, especially at the edge. Balasubramanian et al. [74] present a testbed for fog applications that facilitates emulating these constraints but requires physical hardware for each node rather than cheaper virtual machines.
Luthra et al. [75] present ProgCEP, an operator-based programming model for complex event processing in the fog. ProgCEP allows placing application operators on fog nodes through an API and facilitates QoS monitoring of that application. While the system does not offer its own operator placement algorithm, it aims to aid the development of algorithms. It could therefore be used to test different application design options on physical infrastructure as is part of our design process. A similar dynamic migration is also implemented in FogBus [76].
To the best of our knowledge, our work is the first that combines best practices, simulation, and emulation into a complete design process for fog-based IoT applications.

Conclusion
Engineering IoT applications in an efficient way is challenging as the process needs to consider both software architecture and its deployment to a physical infrastructure. Existing approaches can only provide limited guidance since they are either based on theoretical models and simulation (i.e., inherently limited in their accuracy) or based on experiment testbeds (i.e., the evaluation effort is too high to explore more than a few design options).
In this paper, we have proposed a five-step process for designing efficient fog-based IoT applications that integrates and extends previous work of ours. Rather than relying solely on global optimization, simulation, or testbed benchmarking, we combine best practices, simulation, and testbed evaluation to choose the most efficient infrastructure options and software service placements from an exponentially growing pool of deployment options. Furthermore, we have shown the effectiveness of this approach through a smart factory case study. By deploying different options on a physical testbed, we also showed that our process identified an efficient application design in our case study and, by extension, that our process achieves the desired results.

B Overview of Application Design Options
Deployed to the Physical Testbed in Case Study Table 5: Overview of the ten application design options selected for deployment on the physical testbed. W denotes the most efficient design as determined by our process. M1-3, F1-3, and B1-3 denote the three designs that were filtered out by MockFog, FogExplorer, and the application of Best Practices respectively. (b) Infrastructure options in the different application design options tested on the physical testbed. Hardware options for the Camera and Production Controller have been omitted for brevity as no service is deployed on these nodes.