Monitoring self-adaptive applications within edge computing frameworks: A state-of-the-art review

Recently, a promising trend has evolved from previous centralized computation to decentralized edge computing in the proximity of end-users to provide cloud applications. To ensure the Quality of Service (QoS) of such applications and Quality of Experience (QoE) for the end-users, it is necessary to employ a comprehensive monitoring approach. Requirement analysis is a key software engineering task in the whole lifecycle of applications; however, the requirements for monitoring systems within edge computing scenarios are not yet fully established. The goal of the present survey study is therefore threefold: to identify the main challenges in the ﬁeld of monitoring edge computing applications that are as yet not fully solved; to present a new taxonomy of monitoring requirements for adaptive applications orchestrated upon edge computing frameworks; and to discuss and compare the use of widely-used cloud monitoring technologies to assure the performance of these applications. Our analysis shows that none of existing widely-used cloud monitoring tools yet provides an integrated monitoring solution within edge computing frameworks. Moreover, some monitoring requirements have not been thoroughly met by any of them.


Introduction
In recent years, a wide variety of software solutions, such as Internet of Things (IoT) applications, have emerged as cloud-based systems.As a consequence, billions of users or devices get connected to applications on the Internet, which results in trillions of gigabytes of data being generated and processed in cloud datacenters.However, the burden of this large data volume, generated by end-users or devices, and transferred toward centralized cloud datacenters, leads to inefficient utilization of communication bandwidth and computing resources.Since all the resource capacity and computational intelligence required for data processing reside principally in the cloud-centric datacenters, data analytics for current cloud solutions (e.g. for Amazon AWS IoT 1 and Google Cloud Dataflow 2 ) is still an open research problem.
To overcome above problem, modern cloud frameworks such as edge ( Shi and Dustdar, 2016 ), fog ( Bonomi et al., 2014 ) and osmotic ( Villari et al., 2016 ) computing are aimed at increasing capabilities and responsibilities of resources at the edge of the network compared to traditional centralized cloud architectures by not only placing services in the proximity of end-users or devices, but also using new data transfer protocols to improve the interaction with datacenter-based services.This also provides a lowlatency response time for the application.
Along these lines, the main goal of the SWITCH project3 is to introduce a novel software design model by which QoS/QoE objectives can be included in the complete lifecycle of applications running in modern cloud frameworks, including edge computing.
Based on this conceptual model, SWITCH solution provides an environment for developing software systems and monitoring their execution, and an autonomous platform for adapting system behavior.The focus of the present paper is on monitoring such selfadaptive applications within an edge computing context, where some data processing takes place at the edge of the network, and therefore monitoring and self-adaptation of this processing is nec- essary in order to ensure that effective use is made of centralized cloud facilities, edge devices, and the entire infrastructure that an edge computing framework provides.
As a motivating example, Fig. 1 provides one modern computing framework schema -edge computing architecture -which includes different layers: (I) Centralized cloud computing, (II) SDN/NFV technologies, (III) Edge computing, and (IV) IoT objects, sensors or users.
The multi-layer architecture, shown in Fig. 1 as an example of pioneer cloud computing frameworks, is described below: Layer (I) Centralized cloud computing layer: The centralized cloud computing layer includes cloud datacenters which can belong to various providers.The centralized cloud computing layer can be utilized for the purpose of long-term storage and application-level data processing operations that are typically less time-sensitive.In other words, the capability of this layer would be using centralized cloud infrastructure in order to run big data analytics and process less time-sensitive data.In this layer, applications may be composed of different, modular services, each one performing different high-level data processing according to users' requirements for various purposes.For instance, an IoT disaster early warning application might have two services in this layer: "warning trigger" service and "call operator" service.A "warning trigger" may be a surveillance service that processes the incoming data measured by sensors in order to notify another service, e.g."call operator", when irregular incidents occur.The "call operator" service may decide whether or not to send an alert to emergency systems or to the public entities.
Layer (II) SDN/NFV technologies layer: Software-Defined Networking (SDN) ( Astuto et al., 2014 ) and Network Functions Virtualization (NFV) ( Jesus-Gil and Botero, 2016 ) are emerging as new ways of designing, building and operating networks.These two complementary technologies are able to support transition of data between edge nodes and cloud datacenters.Introduction of these technologies can easily improve dynamism and management of the network.For example, it is possible to change the path through which data flows between the centralized cloud computing layer (the first layer) and the edge computing layer (the third layer) if the current network quality is not satisfactory.In combination, SDN and NFV have the potential to specify network flows, enhance network performance and also provide network management abstraction independent of underlying networking devices.In this way, SDN and NFV can be taken into consideration as enabling solutions to steer the evolution of not only network environments, but also new cloud-based application architectures such as edge computing frameworks.
Layer (III) Edge computing layer: At the edge computing layer, edge nodes represent gateways and data capturing services able to act on raw data, for example aggregating, filtering, encrypting and encoding the local data streams in real-time.This layer is a place where the cloud resources are being distributed and moved near to the end-users and end-devices.This is the main reason that edge computing has been called ubiquitous computing as well.The rationale of employing edge nodes is to analyze time-sensitive data closer to the location where these data streams are collected, hence taking some of the computational load off the resources at the centralized cloud computing layer (the first layer) and, in some cases, reducing network load too ( Satyanarayanan, 2017 ) at the SDN/NFV technologies layer (the second layer).Edge nodes can also have other functionalities, dependent on the requirements of individual use cases.In an early warning system, these nodes consist of services to receive data over direct link from sensors, filter the input data stream, aggregate the measured values and send the data to the "warning trigger" which is another service running at centralized cloud computing layer.Hence, an edge computing layer can offload significant traffic from the core network and datacenters.That is why this new paradigm, as an extension of centralized cloud, provides low latency, location awareness, and optimizes users' experience under QoS requirements for time-critical and even real-time applications.
Layer (IV) IoT objects/sensors/users layer: In this layer, connected devices have a pervasive presence in the Internet.Objects (e.g.smart home modules) can be remotely controlled; sensors are able to measure different parameters (e.g.temperature, barometric pressure, humidity and other environmental variables) and also users are capable of using online software solutions via connected devices such as mobile phones.
The multi-layer edge computing architecture depicted in Fig. 1 offers the following improvements over the classical cloud computing model: • Reducing the amount of network traffic: Edge nodes at the edge computing layer (the third layer) are able to filter unnecessary data and significantly aggregate only key information that should be streamed to the centralized cloud computing layer (the first layer), and then received, stored and processed.• Improving the application performance: Data processing locally on edge nodes at the edge computing layer (the third layer) next to end-users or end-devices rather than at the centralized cloud computing layer (the first layer) can be exploited as a solution to shrink latency, reduce response time and hence improve the application QoS.
• Facilitating new approaches of load-balancing: The edge computing paradigm has introduced new functionalities of service migration such as movement of running services between the centralized cloud computing layer (the first layer) and the edge computing layer (the third layer) to support load-balancing on-demand ( Sharma et al., 2017 ).Or it can provide a highlyimproved load-balancing behavior by scaling the power of computing locally on edge nodes when compared to traditional, centralized computation.• Providing the awareness of location, network and context information: In consequence of the edge computing architecture, it is now possible to track end-users' information such as their location, mobility, network condition, behavior and environment in order to efficiently provide customized services.This would ensure end-users' needs and preferences for QoE (as direct measurement of users' satisfaction) are accounted for.• Minimizing energy consumption: Rapid growth in the number of objects and users connected to the Internet has been always been associated with the demand for maximizing the energy efficiency.Tasks can be offloaded from end-devices to edge nodes that are not far away such as centralized cloud datacenters.This fact helps to reduce the energy consumption at end-devices, centralized computing infrastructures and network points between the edge computing layer and centralized cloud computing layer.
The performance of edge computing applications varies significantly depending on runtime variations in running conditions e.g. the number of arrival requests to be processed, availability of virtualized resources, network connection quality between different application components distributed over the Internet, etc.Therefore, tracking dynamic changes of operational environments is essential in order to identify and remedy any deterioration of system health.To this end, the use of monitoring capabilities in every layer of an edge computing framework helps the cloud-based application provider to recognize where any performance bottlenecks are.Besides this, it allows the system to predict potential issues, and to enhance the application performance to avoid QoE degradation experienced by the user.
Such advanced cloud-based frameworks, providing highly distributed, heterogeneous and even federated environments, can exploit lightweight container-related virtualization technologies (such as CoreOS,4 Kubernetes,5 OpenShift Origin6 and Docker Swarm7 ) for the automatic deployment of different services which communicate with each other through well-defined, efficient mechanisms.Therefore, various monitoring requirements in terms of cloud infrastructure, container virtualization, communication network and application-specific condition are present when using edge computing platforms.
The significance of a monitoring system fully-configured for an application adaptation objective has been discussed in experience studies in various cloud contexts.Such studies present a wide range of monitoring technologies and methodologies of diverse applicability in practice.Various research works have analyzed these monitoring tools and approaches needed for cloud applications so far ( Aceto et al., 2012;Aceto et al., 2013;Fatema et al., 2014;Garcia-Valls et al., 2014;Mohamaddiah et al., 2014;Ward and Barker, 2014;Hazarika and Singh, 2015;Alcaraz-Calero and Aguado, 2015;Sugapriya and Jeya, 2015;Alhamazani et al., 2015 ).However, such monitoring studies have not addressed the area of self-adaptive applications within edge computing frameworks, which represent a new era of cloud computing.This is an important area, since failing to analyze and determine the whole spectrum of monitoring requirements in the right way may lead to a massive software engineering failure for these types of applications.
The primary goal of the present paper has been to explore the fundamental challenges to the evolution of monitoring in the edge computing context that are not well-addressed in academic literature and industries.In order to understand how these challenges can currently be met, we focus on the comparison and analysis of characteristics of existing cloud monitoring technologies to determine their strengths and weaknesses precisely in this domain.
More specifically, the main contribution of this paper can be summarized as follows: (I) providing a systematic analysis of different monitoring concepts within edge computing frameworks; (II) identifying the main open challenges and significant technical issues in such modern monitoring approaches; (III) providing a taxonomy of monitoring requirements needed to support dynamic adaptation of applications within edge computing frameworks; and (IV) presenting the future research directions for monitoring techniques in the adaptation of edge computing applications.
The rest of the paper is organized as follows.Section 2 presents a requirement analysis of monitoring levels within edge computing frameworks.We finish this section with an overview of challenges.Section 3 develops a taxonomy of monitoring requirements for edge computing applications.Section 4 discusses existing widelyused monitoring technologies and the level to which they address the requirements identified from an edge computing viewpoint.The conclusion appears in Section 5 .

Monitoring levels
To adapt edge computing applications to the changing execution environment and ensure application QoS requirements continue to be satisfied, it is necessary to employ a comprehensive monitoring system able to address the whole spectrum of requirements, pertaining to different levels including (1) the underlying infrastructures (e.g.VM's computing resources, etc.), (2) edge computing platforms (e.g.Docker containers, etc.), (3) network connections between individual application components and (4) application-specific measurements (e.g.service response time, etc.).
This section explains a requirement analysis of all four previously-mentioned monitoring levels for cloud-based applications from an edge computing viewpoint.A modern software engineering discipline provides an approach to design such applications based on a set of different loosely coupled independent components running either in Virtual Machines (VMs) or containers and hence, for completeness, we consider both VM and container levels of virtualization.Emphasis in this section has been put on the importance of monitoring needs for self-adaptive applications from edge computing viewpoint, in order to present a new taxonomy of monitoring requirements for such applications in Section 3 .Wood et al. (2008 ) Modeling resource utilization of cloud applications CPU, disk, network usage Kwon and Noh (2013 ) IaaS cloud monitoring CPU, memory, disk Meera and Swamynathan (2013 ) IaaS cloud monitoring CPU, memory Clayman et al. (2010a ) Monitoring federated clouds CPU, memory, network usage Caglar and Gokhale (2014 ) Intelligent resource provisioning CPU, memory

VM-level monitoring
All the physical resources including CPU, memory, disk and network can be virtualized.Multiple VMs can be deployed on a single physical machine and thus share physical resources between each other.Based on the vision of edge computing, it is necessary to have control over a pool of configurable virtualized resources exploited in both the centralized cloud computing layer (the first layer depicted in Fig. 1 ) and the edge computing layer (the third layer shown in Fig. 1 ).It should be possible for such resources to be autonomously provisioned and de-provisioned with little, or preferably no intervention of an application provider.In order to have efficient resource utilization and prevent any problems in virtualized resources, monitoring of the VMs used in the cloud datacenters and edge nodes is critical.Performance optimization can be best achieved by efficiently monitoring the utilization of these virtualized resources.Capabilities for monitoring such resources include tools for monitoring mainly usage of CPU, memory, storage and network: • CPU usage shows the amount of actively used CPU as a percentage of total available CPU in a VM.If the processor utilization reaches 100% and the CPU run queues start filling up, the system has run out of available processing capacity and adaptation action must be taken at that point -or, preferably, before that point, in anticipation.• Memory usage indicates the percentage of memory that is used on the selected machine.• Disk usage refers to the amount of data read or written by a VM.Or it can also indicate the percentage of used drive space.
Adding additional storage to the VM and allocating it to the appropriate partition can often resolve disk space issues.• Network usage is the volume of traffic on a specific network interface of a VM, including external and internal data traffic.
Relevant papers that have been published in this area are included in Table 1 .We provide more details in the following paragraphs.
Al-Hazmi et al. (2012) adopted a cloud monitoring system to provide Monitoring-as-a-Service (MaaS) usable by both cloudbased application providers and customers which may have different views on the monitoring data in a multi-tenant environment.Their monitoring solution works across federated clouds; however it is restricted to datacenters' monitoring tools.This approach needs all infrastructures to apply the same monitoring system; whereas an important requirement in edge computing frameworks is for a monitoring solution which is not cloud-specific and capable of working across federated testbeds.Wood et al. (2008) developed a mathematical model to estimate resource overhead for a VM.The proposed model can be adopted for approximating virtualized resource requirements (especially CPU) of any application on a determined platform.Moreover, the model can be used to estimate the aggregated resource needs for VMs co-located on one host.Their solution defines the minimum amount of resources necessary for a VM to avoid performance reduction because of resource starvation.However, their approach is not able to directly measure how application perfor-mance (measured, for example, by response time) will vary.Another notable point in this work is that profiling resource usage of virtualized applications is offline and occurs on monthly or seasonal timescales.Therefore, it makes the proposed model less useful due to the dynamic environment of edge computing scenarios.Kwon and Noh (2013) demonstrated an architecture for a monitoring system consisting of a dashboard to show the real-time resource utilization for servers and VMs.It was mentioned that if CPU, memory, and storage are overloaded, then the virtual servers will not be able to perform their normal function.However, the article did not explain how the experiment could be implemented in practice and therefore it is not completely clear that the proposed solution can be used to improve the performance of VMs, as the authors claim.It should be noted that the proposed monitoring architecture is usable only for a specific kind of virtualization (Xen hypervisors).Meera and Swamynathan (2013) proposed an agent-based resource monitoring system which provides VM-related information (CPU and memory utilization) to the cloud-based application provider for efficient resource optimization.The proposed monitoring architecture can be improved by adding an alarm feature that triggers if a value breaches a threshold that can be used for many purposes such as failure prediction or Service Level Agreement (SLA) assessment.This architecture is limited by its dependence on centralized coordination.In contrast, with regard to the needs of self-adaptive edge computing solutions, monitoring agents which report the availability of resources should be autonomous.Clayman et al. (2010a) described a monitoring framework called Lattice which is able to measure and report system parameters of virtual infrastructures for the management of cloud-based services in real-time.This monitoring framework provides necessary libraries along with APIs to implement a customized monitoring system and it could be considered more as a toolkit than a ready-to-use tool.The main functionality of the Lattice monitoring framework is the collection and distribution of measurement data through either UDP or multicast protocol.Therefore, their proposed solution does not include the functionality for visualization.Caglar and Gokhale (2014) presented an autonomous, intelligent resource management tool called iOverbook usable in heterogeneous and virtualized environments.This tool provides an online overbooking strategy based on a feed-forward neural network model by carefully considering the historic resource usage to forecast the mean hourly CPU and memory usage one step ahead.However, their work could be improved by effective filtering of potential outliers that can quench heavy data transmission overhead especially on edge nodes in large-scale edge computing environments.Also, in order to make the proposed solution capable of working with a high sampling frequency in highly dynamic environments, one step further could be taken to consider various time intervals instead of a fixed hourly rate.

Container-level monitoring
In comparison with VMs, the use of containers which does not require an Operating System (OS) to boot up as another form of server virtualization ( Seo et al., 2014 ) is rapidly increasing in pop- According to the edge computing trend in which cloud environments are becoming more dynamic and workloads vary over time, using this lightweight cloud technology can support selfadaptation of the entire system to address the needs of application providers and users.This support is also due to the ability of particularly interoperable service packaging and orchestration that this technology provides us to bind various software components in order to build the whole application.It means that decentralized edge clouds which move the computation and data management to the edge of the network away from datacenters require a lightweight distribution and orchestration of portable application components such as containerization.
If the system uses containers to run application services in both the centralized cloud computing layer (the first layer depicted in Fig. 1 ) and the edge computing layer (the third layer shown in Fig. 1 ), container-level monitoring becomes mandatory.
According to the literature ( Stankovski et al., 2016;Preeth et al., 2015;Beserra et al., 2016;Vangeepuram, 2016;Dusia et al., 2015 ), the common set of container-level metrics to be monitored and useful in the context of application adaptation is shown in Table 2 .
Besides this, there are different tools provided just in order to monitor containers and display runtime value of key attributes for a given container, as listed in Table 3 .
All container-specific monitoring tools compared in Table 3 provide a REST API to expose statistics about a given container and this remote API can be externally invoked by other entities.
Docker provides a built-in command called docker stats which reports runtime metrics and resource usage for a given container.
Retrieving a detailed set of metrics is also possible by sending a GET request to the Docker Remote API. 10ontainer Advisor (cAdvisor) is an open-source system that measures, aggregates, processes, and displays monitoring data about running containers.This monitoring information can be used as an understanding of the runtime resource usage and performance characteristics of running containers.cAdvisor shows data for the last 60 s only.However, it supports the ability to easily store the monitoring information in an external database such as InfluxDB that allows long-term storage, retrieval and analysis.
InfluxDB is an open-source Time Series Database (TSDB) capable of real-time and historical analysis.Complementing this, Grafana is an open-source Web-based user interface to visualize largescale monitoring information.It is able to run queries against the database and show the results in an appropriate scheme.On top of cAdvisor, using Grafana and InfluxDB could effectively improve visualizing the monitored parameters collected by cAdvisor in concise charts for any time period.
Prometheus 11 is an open-source monitoring tool as well as a TSDB.It gathers monitoring parameters from pre-defined resources at specified intervals, shows the results, checks out rule expressions, and is capable of triggering alerts if the system starts to experience abnormal behavior.Prometheus uses LevelDB12 as its local storage implementation for indices and storing the actual sample values and timestamps.Although cAdvisor, in comparison with Prometheus, has been considered as the easier tool for use, it has limits with alerting on occurrence of an identified event when something requires attention.However, both may not properly provide turnkey scalability themselves, capable of handling large number of monitored containers.
Docker Universal Control Plane (DUCP) is a tool to manage, deploy, configure and monitor distributed applications built using Docker containers.This container management solution supports all the Docker developer tools such as Docker Compose to deploy multi-container applications across clusters.High scalability and Web-based interface are some of the key features of DUCP as a Docker native commercial solution.
Scout13 is another container monitoring tool which has a Webbased graphical management environment, and is able to store at most 30 days of measured metrics.It consists of a logical reasoning engine capable of alerting based on metrics and their associated predefined thresholds.Similar to Scout, there are many commercial solutions to monitor containers with the same characteristics.
Container-level monitoring is currently a hot research topic, as compared to related areas, for example monitoring of cloud infrastructures.Relevant papers that have been published in this area are included in Table 4 .Stankovski et al. (2016) proposed a distributed self-adaptive architecture that applies the edge computing concept with containerbased technologies such as Docker and Kubernetes to ensure the QoS for time-critical applications.Their idea is to deploy the containerized application (file upload use case) in different geographic locations in a way that the service is created, served and destroyed for every file upload request.For each container, features of resources required for the host can be allocated upon monitoring data and operational strategies defined by the end-user, application developer and/or administrator.Preeth et al. (2015) evaluated the performance of Docker containers based on system resource utilization.From their benchmarks, the authors found that container-based virtualization can be compared to an OS running on bare-metal in terms of memory, CPU and disk usage.With regard to these three metrics, the performance of Docker approximates the performance of native environment.But, according to the network utilization, the host OS has considerably less network bandwidth compared to the Docker container.However, in this work, one host was allocated to just one container to simplify the experiments.In addition, container performance against other virtualization technologies could be evaluated as a further complement to this work.Beserra et al. (2016) analyzed the performance of container virtualization in comparison with VM-based virtualization for HPC disk I/O-bound jobs.Based on the evaluation, results showed that containerized environments generally work more effectively than VMs for disk I/O and data intensive applications.Moreover, both virtualization technologies achieved the same performance if there is just one abstraction per physical server.
Vangeepuram ( 2016) undertook an experimental performance comparison of Apache Cassandra TSDB between Linux container and bare metal.They considered three different workload scenarios: write, read and mixed (read & write) loads.The results of the research work showed that the Cassandra cluster on bare metal consumes less CPU utilization than it does when it is containerized in both write and mixed scenarios, but there is no significant difference for the read scenario.The Cassandra cluster on bare metal has less latency compared to containers in all scenarios.Furthermore, considering the disk throughput, no concrete inference can be drawn from the findings.However, memory utilization was not considered in this work.Monitoring memory consumption would have provided more realistic results, since the performance of Cassandra becomes poor if the system does not have enough memory as it starts to do mostly garbage collection at some point.Dusia et al. (2015) introduced a mechanism to guarantee network QoS for time-critical applications using Docker.Their implementation is able to prioritize the network access of all containers running on a host in a way that the containers with higher priority will be given more share of the total available network bandwidth.Therefore, in this way, a containerized application that is network bandwidth intensive cannot result in a poor or undesirable execution due to other applications sharing a Docker host.However, authors performed their experiments in a static setting, without considering current dataflow requirements or ongoing buffer status.

End-to-end link quality monitoring
Cloud-based applications, such as early warning systems, have time-critical requirements, such as minimal delay and jitter tolerance, and require suitable support to achieve guaranteed networkbased QoS.This is a challenge because performance is difficult to keep up if the network infrastructure's conditions continuously change.
The idea that some application services are deployed on the nodes at the edge of the network and others on centralized datacenters has raised serious concerns about the network quality of links between these services across an edge computing framework.This is a challenging research area, because it relates not only to live-migration of services between edge nodes and datacenters, but also among different nodes at the same layer in edge computing frameworks.
Network performance for all communications passing through the edge computing framework has to be measured by end-to-end link quality monitoring.Regardless of what network technologies are being used, the edge paradigm contains four types of network connections among application components to be considered: • Communications between a cloud datacenter and an edge node: new enabling technologies, such as SDN and NFV, provide a basis for applying advanced principles on how networks can be appropriately developed, implemented, deployed and operated between the cloud datacenter and edge nodes in an edge computing framework.In recent years, SDN and NFV open up novel opportunities providing a method for virtualizing network on-demand based on end-to-end connection quality at any time.Network components (e.g., routers, bridges, switches, etc.) can be virtualized by NFV, and thereby it is efficiently possible to dynamically instantiate, migrate and scale up or down network functions such as routing, packet forwarding and firewall services.Alongside NFV, SDN offers a set of APIs and control protocols such as SNMP and OpenFlow that enable the network to be programmable, managed and automated.• Communications between edge nodes: edge nodes at various geographical locations manage a pool of virtualized resources locally.In this way, collaborative provisioning and content delivery among peered edge nodes helps the application provider improve the entire application performance.The data transmis-  Mohit (2010 ) Support application QoS (for communication services) Throughput, delay, packet loss Hsu and Lo (2014 ) Ensure QoE to the users (for multimedia services) Throughput, delay, packet loss, jitter Taherizadeh et al. (2016a ) Ensure QoE to the users (for data streaming services) Throughput, delay, packet loss, jitter Cervino et al. (2011 ) Support application QoS (for real-time streaming services) Throughput, delay, packet loss, jitter sion among these nodes can be performed either by a centralized approach such as SDN, or via a fully distributed method through traditional routing protocols, e.g.OSPF ( Verma and Bharadwaj, 2016 ).According to the literature ( Lampe et al., 2013;Chen et al., 2014;Samimi et al., 2007;Mohit, 2010;Hsu and Lo, 2014;Taherizadeh et al., 2016a;Cervino et al., 2011 ), the most important metrics to be analyzed for network measurement include: • Network throughput, which is the average rate of successful data transfer through a network connection.• Network delay, which specifies how long a packet takes to travel across a link from one endpoint or node to another.This metric can also mean Round-Trip Time (RTT) which is the time elapsed from the propagation of a message to a remote place to its arrival back at the source.• Packet loss, which is when one or more packets of data traveling across a network fail to reach their destination.• Jitter, which is the variation in the end-to-end delay of sequential received packets.This network parameter is extremely important for real-time applications, e.g. oil exploration or connected vehicle application, as jitter impacts the size of the associated data stream buffers.
As can be seen in Table 5 , some effort s have been made to research and build monitoring systems that focus on end-to-end link quality measurement of cloud environments.Lampe et al. (2013) explained that limitations of the network infrastructure, such as high latency, potentially affect the QoS of cloud-based computer games for a user.The authors conducted their research focusing on network latency measurement and would benefit from additional end-to-end link quality metrics; for example, the effects of network disturbances, such as increased packet loss or fluctuating throughput which are noticeable indicators of network performance within edge computing frameworks.Chen et al. (2014) performed an extensive traffic analysis of two commercial gaming systems (StreamMyGame and OnLive).The results demonstrate that limitations of bandwidth and packet loss cause a negative effect on the graphic quality and the frame rates in the cloud gaming systems.Alternatively, the network delay does not predominantly impact the graphic quality of the gaming services, due to buffering.The authors focus on the users' perspective in cloud-based gaming systems, and the performance of gaming services could be evaluated from the service providers' perspective as another research area.Samimi et al. (2007) introduced a model including a networkbased monitoring system and enabling dynamic instantiation, configuration and composition of services on overlay networks.The results show that to simplify and accelerate the deployment and prototyping of communication services, distributed cloud infrastructures can be designed and used to dynamically adapt the service quality to the workloads in which the service provider needs a large number of resources.However, when it comes to overlay networks, encapsulation techniques are not without drawbacks, including overhead, complications with load-balancing and interoperability issues with devices like firewalls.Mohit (2010) mentioned that computation-based infrastructure measurement is not adequate for the optimal implementation of running cloud services.Network-level evaluation of the cloud service is also very important.The author suggests an approach that includes use of different technologies without implementation and detailed information.Moreover, their solution relies on high capacity edge routers which are expensive and consequently cannot be afforded in all use cases.Hsu and Lo (2014) presented a mapping from QoS to QoE and thereby an adaptation model to translate end-to-end link quality metrics (including delay, packet loss rate, jitter and throughput) into QoE of the end-user in multimedia services running on the cloud.To this end, they proposed a function to evaluate the QoE score after the user watches the streaming video.The results indicate that the network QoS and users' QoE are consistent and linked together.Therefore, service providers are able to apply the proposed autonomous function to calculate users' QoE impression and to rapidly react to the QoE degradation.The proposed approach does not take into account the trade-off between network cost optimization and quality which is a significant factor to be considered in the adaptation of edge computing scenarios ( Ahmed and Rehmani, 2017 ).However, it should be noted that different users have different objectives e.g.conflicting objectives of price and quality.Taherizadeh et al. (2016a) proposed a non-intrusive network edge monitoring approach which is able to measure critical QoS metrics including delay, packet loss, throughput and jitter in order to adapt service quality experienced by the users for real-time data streaming applications running on edge computing platforms.The authors claim that network edge-specific monitoring knowledge helps application providers accomplish more satisfactory adaptations to the user's conditions (e.g.network status).In this work, the main adaptation possibility includes dynamically re-connecting users to a set of the best reliable servers offering fully-qualified network performance.Cervino et al. (2011) performed experiments to evaluate the benefits of deploying VMs in clouds for P2P streaming.The authors strategically placed network traffic distribution nodes into cloud infrastructures around the world owned by Amazon.The main goal is to improve the QoS of live-streaming even in P2P video-

Application-level monitoring
The edge computing framework itself is application agnostic in that it is not dedicated to a single type of software system or purpose.However, every cloud-based application needs to be extended to include application-level monitoring capabilities to measure metrics that present information about the situation of the service and its performance.Although a large number of research works consider the reliability of the underlying cloud infrastructures, there is still a lack of efficient application-level monitoring techniques to be able to detect and measure QoS degradation.In Table 6 , different works to monitor application-level metrics are summarized.Leitner et al. (2012) proposed the monitoring system called CloudScale which measures the distributed application's performance at runtime and also adopts user-specified scaling policies for provisioning and de-provisioning of virtual resources.Their proposed event-based approach models the workload behavior, supports multi-dimensional analysis, and defines the adaptation action.However, it considers only elasticity which often could be to increase and decrease the total number of computing nodes in the resource pool, regardless of application topology or reconfiguration.Evans et al. (2015) used container-based virtualization to run a Twitter analysis application called Sentinel.The application consists of multiple containerized components distributed in the cloud and can provide Docker container reconfiguration on demand as well as real-time service monitoring to inform the reconfiguration module to restructure the application based on changing circum-stances (load, etc.).The proposed system is capable of being scalable for running components to be dynamically duplicated in order to share the workload.Emeakaroha et al. (2012) implemented the monitoring framework called CASViD which is general purpose and supports the measurement of low-level system metrics for instance CPU and memory utilization as well as high-level application metrics which depend on the application type and performance.The results imply that CASViD, which is based upon a non-intrusive design, can define the effective measurement interval to monitor different metrics.It can offer effective intervals to measure different metrics for varied workloads.As the next step, it is possible to improve this framework in order to support multi-tier applications.This is challenging, due to the distributed nature of edge computing applications nowadays, each application has different types of components with different application-level metrics.Farokhi et al. (2015) proposed a fuzzy autonomic resource controller to meet service response time constraints by vertical scaling for both memory and CPU without either resource over-and under-provisioning.The controller module autonomously adjusts the optimal amount of memory and CPU needed to address the performance objective of interactive services, such as a Web server.Since the maximum memory or CPU capacity is limited, the proposed system could be extended to consider both vertical and horizontal scaling to be able to afford unlimited amount of workload possibly generated in large-scale edge computing scenarios.Xiong et al. (2013) proposed a model-driven framework called vPerfGuard to achieve automated adaptive application performance.This approach, which is capable of identifying performance bottlenecks, has three modules: (1) a sensor element to collect runtime system metrics as well as application-level metrics e.g.response time and application throughput; (2) a model building element that enables the creation of a customized performance model showing the correlation between system metrics and application performance; (3) a model updating element to automatically detect when the performance model needs to be changed.A potential downside to this approach is that if the performance analyst does not have enough knowledge or experience to define a proper performance metric to make the model, their mechanism may not be useable.Moreover, outliers can have huge effects on the regression, limiting the dependability of the metrics obtained as a basis for adaptation in some contexts.Mastelic et al. (2012) discussed how the CPU or memory usage affects the response time, which in their case, is the render time per frame.Their monitoring system includes the consump-tion of all processes belonging to the application.These processes have the same parent process and hence by listing a list of process IDs (PIDs) for the monitored application, it is possible to sum up the resource consumption of all processes belonging to the application and calculate total resource consumption at a certain point in time.This model could benefit from being extended to include other types of metrics; for instance network-level parameters, since monitoring and management of various real-time data streaming applications including audio and video streaming services is a big data challenge.Shao and Wang (2011) proposed a performance guarantee approach based on a Runtime Model for Cloud Monitoring (RMCM).In this work, a performance model is constructed upon runtime monitored data using the linear regression algorithm.These relevant metrics include the resource allocated to the VM where the application resides, the number of co-existing applications on the same VM, the actual resource occupation by the application, the workload and so on.The results show that the performance model can be effective for controlling the provisioning approach to attain specified performance objectives.Because of the manual installation and configuration of monitoring agents in this work, nonfunctional requirements of monitoring, such as scalability and migration, may not be satisfiable.A further direction to pursue could be also to consider how to find the effective measurement intervals.Wamser et al. (2017) focused on the live-migration of service instances for HTTP-based video streaming to cope with the impact of user mobility.They used an edge computing environment to obtain fast replacement of cloud services across different edge nodes if a user perceives poor video quality.In this work, supported by the INPUT project, 14 a Deep Packet Inspection (DPI) monitoring tool was used to measure three application-level metrics including frames per second, dropped frames and video quality, since these metrics have a positive correlation with the user's QoE.When any of predefined thresholds for these metrics is violated for a specific period of time, the service has to be migrated from the current edge node to another one.Rossi et al. (2015) introduced a model to estimate the response time of cloud applications according to the Linux OS's counters, for example LoadAVG.One of the most important reasons to estimate application-level metrics upon low-level metrics (e.g.CPU load) is that monitoring application-level metrics (such as response time) could cause overhead in both network channel and computing resources.Moreover, monitoring high-level metrics can have privacy implications for users.The results show that the load values given by LoadAVG accompany the application response time behavior, which leads to a strong positive correlation between the two behaviors.Their work only considers LoadAVG, and hence this model could be developed more completely by exploring other counters such as iostat and netstat .Jamshidi et al. (2015) presented a self-learning adaptation technique called FQL4KE which is a fuzzy control method based on the Reinforcement Learning (RL) algorithm for learning optimal elasticity policies.This approach aims at automating the scaling process without leveraging any a priori knowledge on the running cloud application.The proposed architecture includes a learning module which constantly upgrades the knowledge base of the controller by learning adaptation rules suitable for the system.However, the proposed approach cannot deal with time-varying goals.If system goals change, the controller has to relearn everything from the beginning.A more fundamental problem in real world environments is the fact that the number of situations can be enormous and 14 The INPUT project, http://www.input-project.eu/ .
therefore the learning procedure could become impractical, due to time constraints in new computing paradigms.Rao et al. (2011) used a distributed RL mechanism called iBalloon for self-adaptive VM resource provisioning in which monitoring is essential for autonomic orchestration and adaptation.Nowadays, cloud infrastructures offer elastic resources by horizontal or vertical scaling solutions to adapt the application performance to the changing workload.However, such current scaling approaches which utilize only infrastructure-related monitoring data may cause severe performance drops during workload variations at runtime.The authors claim that monitoring only infrastructure-level metrics such as memory and bandwidth without taking into account how application performance is behaving (application-level monitoring) at runtime would complicate the resource provisioning problem due to the lack of detailed measurement.In their work, according to the proposed vertical scaling approach, each VM is able to adjust its resource allocation in terms of CPU, memory, and bandwidth.The iBalloon architecture includes three fundamental elements: (1) a host-agent which is in charge of allocating resources to the VMs; (2) an app-agent which includes the monitoring part of iBalloon and reports runtime information about application performance; and (3) a decision-maker which hosts an RL agent placed at each VM to perform automatic resource capacity adjustment.However, iBalloon is limited because it does not consider other virtualized resources e.g.storage, nor does it support other adaptation actions e.g.migration to improve application performance.This is an issue because in order to manage an application deployed on an edge computing framework, it is necessary to consider the migration of application components among heterogeneous infrastructures ( Desertot et al., 2005 ).
Islam et al. ( 2012) developed a proactive cloud resource management approach in which linear regression and neural networks have been applied to predict and satisfy future resource demands.The research problem in this work actually is to analyze time series monitoring data to extract a prediction model and other characteristics of the monitoring data.The proposed performance prediction model estimates upcoming resource utilization (e.g.aggregated percentage of CPU usage of all running VM instances) at runtime and is capable of launching additional VMs to maximize application performance.The authors considered predictive accuracy based on the application performance in terms of response time.This approach provides distributed scaling and can be enhanced to address the resource allocation of a single VM also.At present, only CPU utilization is used to train prediction model and their approach could further include other types of resources, e.g., memory, disk and bandwidth.

Challenges to the evolution of monitoring in the edge computing context
Based on our analysis, the following are the most important current challenges in monitoring adaptive applications within edge computing frameworks: • Mobility management: When a client device is moving, due to varying network parameters of link between end-user and edge node such as jitter, delay, bandwidth and so on, the application QoS can increase and decrease rapidly, in a way that is potentially difficult to predict ( Ahmed and Ahmed, 2016 ).As the location of users or end-devices may change over time, dynamic service migration has gained increasing attention in the context of the edge computing paradigm to deliver always-on services.• Scalability and resource availability at the edge of the network: Lightweight jobs are most likely processed at the edge of the network, and hence edge nodes may have hardware capacity limits.Simultaneously, it should be guaranteed that these nodes are able to accommodate the increasing demand for delivering services and growing network traffic volume.Edge nodes should ensure the availability of the service regardless of the number of end users' client devices at the edge network ( Ahmed and Ahmed, 2016 ).• Prior knowledge: Supporting the QoS constraints requires the pre-knowledge of execution environment such as underlying infrastructure and the configuration of application components on the network ( Xiao et al., 2016 ).An adaptation technique is fully advantageous, if it does not require relying on the knowledge provided by previous experiences.• Data management: In large-scale environments, monitoring probes are generating massive amounts of collected data to be aggregated, processed and stored.Consequently, it poses other challenges for instance the utilization of distributed datacenters with more bandwidth ( Zhao et al., 2015 ) and flexible integration capabilities ( Esposito et al., 2015 ).• Coordinated decentralization: The challenge in building a decentralized system is to ensure that all different application components collectively move the whole system towards a common goal.• Saving expense, time and energy: On-demand resource allocation should be flexible enough in order to dynamically assign infrastructure according to the needs of modern self-adaptive cloud applications, and an important attribute of such is to reduce their cost.However, cost optimization should not result in QoS/QoE degradation.What should be noted here is that application providers need to take into account where the adaptation logic is inserted: separately, or together with the monitoring components.For example, whether monitoring modules of the application are in charge of considering network cost optimization or it is a task of self-adaptation engine.• Interoperability and avoiding vendor lock-in: The vendor lockin situation is generally considered as a disadvantage in cloud computing ( Toosi et al., 2014 ).Recently, significant attempts have been made to address this challenge by driving standardization of edge computing.For example, the open-source EdgeX Foundry project 15 which has started in 2017 supported by the Linux Foundation is aimed at developing a vendorneutral framework for IoT edge computing.Similarly, the Open-Fog consortium 16 has been founded by tech-giants such as Dell, Cisco, Intel, Microsoft and ARM in collaboration with Princeton University to drive standardization of fog computing.This consortium is aimed at leveraging cloud, edge and fog architectures to enable IoT scenarios.• Optimal resource scheduling among edge nodes: A scheduling mechanism must be intelligent enough to guarantee responses upon the uncertainty of the runtime environment ( Lee and Lee, 2015 ) such as changing workloads, users' channel diversity, their unstable network conditions, user mobility and so on.• Fault tolerance: The application should continue to operate under the presence of a fault ( Chang et al., 2014 ), such as losing control over edge nodes or undetermined latency.This requires elements of decentralized control or off-line detection and recovery.Fault tolerance has received substantial attention for real-time systems due to their safety critical nature.• Proactive computing: The new paradigm is autonomously triggering decisions and actions by anticipating future states.To achieve this vision, time constraints must be taken into account and it usually involves dealing with large amounts of historical and streaming data ( Fournier et al., 2015 ). 15The EdgeX Foundry project, https://www.edgexfoundry.org/ . 16The OpenFog consortium, https://www.openfogconsortium.org/ .
• Replication of services: In the context of edge computing, services can be replicated running on multiple geographically distributed cloud infrastructures ( Farris et al., 2017 ).Replication of servers has its own technical issues such as temporary inconsistencies to be considered.Furthermore, different companies have different regulations of using (or not using) new technologies e.g.internal legislation on the location of data storage.• Container security: Using container-based virtualization undoubtedly supports the edge computing paradigm, but poses new security threats.For example, containers are able to communicate directly with the host kernel that can lead to security vulnerability in the system.• Non-specific edge nodes: In the multi-layer edge computing framework shown in Fig. 1 , edge nodes are meant to be a set of heterogeneous computing platforms possibly not as powerful as cloud datacenters.However, the current industry rarely comes up with a solution for providing general-purpose edge nodes involved in computation and data analytics.In this regard, the LightKone project17 has started in 2017 to move computation out of datacenters and directly on the edge of the network.As a further example, the main vision of another project called Open Edge Computing (OEC) 18 is that all nearby components such as WiFi access points, DSL-boxes, base stations would be capable of offering resources through standardized, open mechanisms to any types of applications to perform computation at the edge.In practice, the edge nodes typically are not convenient for handling the workload as general-purpose computation.Therefore, similar to these nodes, their monitoring mechanisms are generally subject to the proprietary use of a specific technology, and hence they are not able to address multi-purpose needs.

Summary
In order to simplify the decision-making self-adaptive mechanism in the edge computing environment, monitoring solutions collect information from different levels.To this end, in addition to monitoring virtualized resources (e.g.CPU, memory, disk, etc.), it is important to consider other levels of monitoring, including container, end-to-end network quality and application.In the current section, the papers have been chosen to be reviewed in a way that all the important functional and non-functional monitoring requirements for adaptive applications within edge computing frameworks would be mentioned.This section also indicated recent challenges and future research directions needing to be explored in this field.From the analysis of the literature in the current section, we derive a new taxonomy of requirements for the development and deployment of relevant monitoring technologies within edge computing frameworks, which we present in Section 3 .

Taxonomy of monitoring requirements in edge computing scenarios
Drawing on the discussion in Section 2 of cited literature, Table 7 presents a taxonomy of monitoring requirements needed to support dynamic adaptation of applications orchestrated upon edge computing frameworks.In this table, there are also 10 common functional requirements which are essential for all types of monitoring systems within edge computing scenarios.Based on this taxonomy, in Section 4 we analyze monitoring tools, to define their challenges and strengths.

Table 7
Taxonomy of requirements for monitoring systems within edge computing frameworks.

Functional requirements
Common in all monitoring levels • Usable by the provider and customers in a multi-tenant cloud environment The functional requirements which are needed for basic monitoring within edge computing frameworks and hence are common in all monitoring levels are summarized in Table 7 and described below: • Usable by the provider and customers in a multi-tenant cloud environment: This requirement means to have the ability of defining multiple roles and views for various types of users with different permissions to access monitoring data.Especially in a multi-tenant provisioning platform ( He et al., 2012 ) where multiple tenants potentially with various QoS values are sharing the same infrastructures and application instances, different tenants should be able to measure parameters and gain access only to the information that pertains to them.• Provide the functionality for visualization: Visualization is a key functionality for analysis of events in dynamically changing environments, such as edge computing scenarios.It provides a powerful interface between the monitoring data stored in the TSDB and the human brain.• Able to filter measured values to diminish data exchanges: It seems important to provide threshold-based filtering capabilities on the monitored nodes to reduce the runtime communication overhead for the monitoring data transmission and storage.

• Able to be tuned to any desired monitoring time interval:
Any custom time interval can be set within the specified number of seconds, minutes, hours, days or even weeks.The length of monitoring interval is required to ensure reliability, to avoid overhead, and to prevent losing control over the running environment during adaptation actions.This functionality is one of core competencies to building a monitoring approach that is optimized for the storage and retrieval of monitoring data, so that (for example) the data can be used to inform future adaptation strategies.• Support scaling adaptation policies for large-scale dynamic environments: Elasticity management of services within edge computing frameworks that is able to support scaling policies even in a large-scale environment needs a scalable monitoring solution, which still is an open issue that is left largely unsolved by many of the present monitoring systems.Different characteristics of a monitoring solution such as data storage mechanism, communication protocol for data collection, the resource consumed to perform the monitoring activity and ability of automatic self-configuration to tune the monitoring system over time may affect the scalability of applications in large-scale dynamic environments.
• Able to set up automated alerts: In edge computing scenarios, there is often a need to be able to create custom alert rules that meet particular criteria.For example, a monitoring solution should be able to trigger alerts if a given VM or container instance starts to behave irregularly, or if a metric reaches its associated threshold.Automatic installation and configuration of monitoring system: Monitoring systems have to be able to automatically detect when a new VM, container or application instance is created due to scaling up elasticity actions.Auto-discovery can be considered as an approach to automatic installation that refers to the process of detecting new devices in a cloud environ-ment and then performing ongoing monitoring on these devices without human intervention.Similarly a running VM, container or application instance might cease to exist due to scaling down elasticity.Automatic installation and configuration of any monitoring system is a necessity to support scaling adaptation policies in large-scale edge computing environment.• Able to be customized based on monitoring needs: It is necessary to support the customizability of monitoring solutions such as metric extension (incorporate and start measuring any new particular metric) which allows covering conditions particular to a specific environment, therefore providing monitoring approach with a comprehensive view of the execution environment.
Functional requirements at both VM and container levels for monitoring systems within edge computing frameworks are as follows: • Independent from underlying cloud infrastructure provider: The management of federated cloud environments in edge computing scenarios needs interoperable monitoring to share information among heterogeneous frameworks.While it is easy to design a cloud-specific monitoring platform, implementing a generic monitoring solution able to work with multiple cloud infrastructure providers remains a challenging issue.
• Quickly react to the dynamic resource management changes over time: This functional requirement is a process, especially for time-critical edge computing applications, through which the monitoring solution rapidly involves detecting and collecting information about the changing environment.Compared to the centralized cloud datacenter approach, edge computing needs a more agile monitoring system especially because the edge of the network is a highly dynamic environment where end-devices may frequently become available/unavailable, devices are moving, or edge nodes change states over time.
• Support monitoring of all types of hardware virtualizations: Cloud monitoring solutions within edge computing frameworks should cover all kinds of hardware virtualizations as well as operating systems across federated clouds.
• Offer an API to expose monitoring data: All monitoring tools working at VM, container, network link or application must be able to provide an API to expose runtime statistics about a monitored entity and this remote API should be externally accessible by other entities.
The functional requirements at network link quality level for monitoring systems within edge computing frameworks are explained as follows: • Monitor the whole range of end-to-end network QoS properties: With regard to end-to-end link quality-level measurement, QoS attributes change constantly and so network-layer parameters (mainly network throughput, delay, packet loss and jitter) need to be closely monitored in the edge computing environment.
• Support on-demand network configuration: Monitoring solutions within edge computing frameworks must be able to support programmable networks which help in managing and controlling virtual network resources dynamically.• Able to reach the device in spite of filters and firewalls: It is usual that specific types of traffic such as ICMP or SNMP packets are filtered in private administrative domains due to various security concerns.In such cases, the monitoring solution should automatically change its mode of operation via different communication protocols to be able to find the user's location, reach the device and then measure end-to-end link quality metrics to the user.To this end, the first requirement is that the network monitoring solution should be able to access the IP address (taking into account any network address translation) of the user's device.• Consider user's link conditions (e.g.network quality): Within edge computing frameworks, beneficial dynamic adaptations to the user's network conditions can be accomplished by utilizing network edge-specific monitoring information.Monitoring and identifying the network quality of connections between endusers and application servers makes decision or control tasks possible which can continuously adapt the deployed service for optimal performance.For instance, in situations where there exists more than one server to provide the service at the edge of the network, the monitoring solution can contribute towards choosing the best application instance to connect to clients based on their network edge conditions such as higher resolution via more stable connection for streaming applications.
Other functional requirements particularly at application level for cloud monitoring systems in edge computing environments have been listed below: • Able to deal with the application topology and reconfiguration: It would be better for the monitoring system to be aware of the topology and any reconfiguration of cloud-based applications due to adaptation actions during the execution within edge computing frameworks in order to support management of metrics collection and various analytics.For example, a monitoring system may know where distributed services together as the whole application are running and how they are connected to each other.• Define effective measurement interval upon the application performance: Measurement intervals for monitoring different types of applications according to their runtime performance should be optimally determined.Short measurement intervals may negatively affect the intrusiveness of monitoring tools, while slow sampling rates might diminish the accuracy of monitoring information.
• Support multi-tier applications: Within edge computing frameworks, in contrast to traditional centralized cloud architectures, a single request sent by an end-user leads to multiple interactions among distinct application tiers across different locations.Distinguishing a multi-tier application's response time as individual constituent durations is critical for application performance diagnosis in edge computing environments.Therefore, any monitoring solution in edge computing frameworks should be able to collect monitoring parameters from multiple tiers of the application.Moreover, the future monitoring generation may also consider multi-tier aspects of the application (e.g.monitoring information about different dynamic provisioning policies at each tier) and the interactions between different tiers.• Able to be adapted with time-varying application adaptation goals: If goals of application adaptation change over time, the monitoring solution may need to be dynamically adapted, while keeping the application running during the monitoring upgrade.In these cases, the implementation of a monitoring approach that collects an individual application-specific metric needs to be dynamically reprogrammed, for example, because of a new different definition of application response time.
From the edge computing point of view, all the important nonfunctional requirements of a monitoring system required to support dynamic adaptation of edge computing applications, as discussed in Section 2 on reviewed literature, are listed below: • Scalability: A scalable monitoring system can handle a remarkable number of monitored resources and services ( Clayman et al., 2010b ).This monitoring feature is a very significant property in edge computing environments because of the necessity of managing a wide variety of parameters that need to be monitored across different layers of framework.The runtime orchestration of VMs or containers in edge architectures possibly including thousands of nodes requires a monitoring solution to be scalable to deliver the monitoring data in a flexible and timely manner.Due to the distributed nature of edge computing applications, existing centralized monitoring tools that lack scalability are not suitable because they fail to distribute the monitoring load, which leads to a single point of failure ( Xu et al., 2016 ).
• Non-intrusiveness: The edge computing viewpoint attempts to provide a set of application services in a lightweight, simple way and any edge computing monitoring solution should follow this lightweight methodology.Therefore, a monitoring implementation should take a non-intrusive approach because of the necessity of being lightweight to the ordinary flows of application and infrastructure ( Taherizadeh et al., 2016b ).Consequently, a low overhead monitoring tool, achieved by adopting minimal processing, memory capacity and communication traffic, is an essential entity for increasing efficiency in such environments ( Agelastos et al., 2014 ).• Interoperability: Edge computing aims at the automatic and cooperative deployment and composition of application services, interconnected over both centralized cloud and edge infrastructures within highly distributed or potentially federated environments.Companies can replicate their application components in facilities hosted by different cloud infrastructure providers to balance services or increase availability and reliability or decrease response time under various network conditions and varied amounts of traffic ( Grozev and Buyya, 2013 ).
Therefore, what will be needed in the future is interoperability of monitoring systems for highly adaptive cloud applications.Unfortunately, monitoring tools prepared by IaaS providers usually are specific to the underlying infrastructure and are not able to monitor an application running on other cloud providers' infrastructures ( Alhamazani et al., 2015 ).• Robustness: Individual parts of a monitoring system themselves, or even network links connected them together, may fail in ways that probably prevent the monitoring system from keeping up and running continuously.One of the significant challenges for modern monitoring tools is that they should be capable of detecting vulnerabilities in an environment and be able to adapt to a new situation in order to continue its operation ( Fatema et al., 2014 ).Based on this monitoring aspect, it follows that considerable attention is needed to develop more robust cloud monitoring tools which help to cope with different failure scenarios during execution.Especially when it comes to the edge computing paradigm, each application component can have a conversation path to other components, which makes the overall system an intricate network of communications.Therefore, analyzing and solving problems for a distributed, federated system can potentially be cumbersome and infeasible without robust monitoring tools.• Live-migration support: Nowadays, virtualization technologies offer a variety of resource management options such as VM/container creation, deletion, and live-migration ( Liaqat et al., 2016 ).With the growing maturity of edge computing frameworks, the demand for highly available applications, consisting of a set of independently deployable, modular, small services, is increasing.To this end, live migration of services is a highly desirable feature of such solutions.In a live-migration scenario, virtualized services can migrate from a host to another one at any time without stopping operations.Accordingly, the challenge in this regard is how the monitoring system can be able to adjust to the new environment and changing network conditions required to prevent faults ( Toosi et al., 2014 ).

Analysis of monitoring tools based on identified taxonomy
As explained before, there exist four levels of monitoring (VM, container, end-to-end link quality and application) needed for selfadaptive edge computing applications.In this regard, the current section has been divided into two subsections in order (I) to describe different widely-used cloud monitoring tools to contribute to the taxonomical analysis and (II) to compare their features based on the taxonomy presented in Section 3 respectively.These comprehensive presentations offer the opportunity to achieve appropriate technical conception of monitoring tools operational within edge computing frameworks.Additionally, the information discussed helps to choose the most appropriate monitoring tool for developing a fully qualified application using edge computing frameworks based on different adaptation objectives.

Cloud monitoring tools
There are many tools that offer continuous monitoring and visibility for cloud applications.The next subsections describe different types of widely-used monitoring tools usable in edge computing frameworks and outline advantages and disadvantages of each.

Zenoss 19
Zenoss is an open-source agent-less monitoring platform based on the SNMP protocol and it monitors networks, servers, applications and services.The main functionality of Zenoss is monitoring principal characteristics of cloud such as availability, inventory, configuration, performance and events which are related to the system.It has an open architecture to enable consumers to customize it based on their needs ( Gupta, 2015 ).This collection/monitoring tool is widely-used in enterprise solutions and provides a user interface by which users can configure and monitor the system ( Telesca et al., 2014 ).It also provides statistics about the number and identifier of hosts and tenants available in the system.
Zenoss is able to use predictive thresholds to protect the edge of the network.In this way, it is able to recognize if an edge node's network interface or link suddenly has considerably large traffic.Furthermore, it is able to filter predefined packets (e.g. based on network geography) away from the nodes at the edge of the network.However, it is not robust enough and cannot support the live-migration of services that are essential within edge computing frameworks.Moreover, the product has a limited open-source version for monitoring and the full version requires payment, restricting its applicability in research contexts.

Ganglia 20
Ganglia ( Massie et al., 2004 ) is a robust distributed monitoring tool capable of being scalable for high-performance computing environments e.g.grids and clusters.It is now being extended to private and public cloud monitoring (e.g. via sFlow21 ).Also, there are cloud-based tools which are integrated with Ganglia, such as Apache Cassandra database usable in edge computing use cases ( Confais et al., 2016 ).However, this monitoring platform does not seem particularly to be focused on edge computing so far.This monitoring system enables the users to have a look at runtime and historical measured data (such as memory utilization and CPU load) for all running VMs/machines that are being monitored.Ganglia uses widely used technologies and protocols such as XDR (External Data Representation), XML (Extensible Markup Language) and RRDtool to store and visualize time series monitoring data.Its implementation has been designed to run on different operating systems and processor architectures, and it has been currently used on the large number of clusters all over the world.Since Ganglia is mainly designed to collect infrastructure monitoring data about machines in a high-performance computing cluster and display them as a series of graphs in a Web interface, it has a drawback for edge computing as it is not appropriate for bulk data transfer (no congestion avoidance, windowed flow control, and so forth).

Zabbix 22
The Zabbix monitoring solution ( Tader, 2010 ) is designed for a server/agent architecture.The Zabbix server runs on a standalone machine to be able to collect and aggregate monitoring data sent by the Zabbix agents.BonFIRE23 is one the main projects which this open-source monitoring software implementation is designed for.The Zabbix solution supports an alerting system that triggers if predefined events and conditions happen, such as if the memory utilization is over 80%.These alarms are beneficial as the triggers initiate adaptation plans, such as elasticity actions.SQL databases are used to store measured metrics, and a Web front-end and an API are provided to access data.The Zabbix monitoring tool is primarily implemented to monitor network parameters and network services.The Zabbix agent is quite resource efficient and it can reside on edge nodes in a way that is non-intrusive to edge network functions.Because the native Zabbix agent has been developed in the C language it has a relatively small footprint.However, robustness of Zabbix could be inefficient.It could become unstable and requires restarting in some occasions ( Simmonds and Harrington, 2009 ).As another important disadvantage to be considered, auto-discovery characteristic of Zabbix can be inefficient ( Murphy, 2008 ).For example, sometimes it may take more than five minutes for Zabbix to discover that a host is no longer available in the network.This limitation in time can be a serious problem for any runtime self-adaptation scenario, especially if conditions are changing rapidly.

Nagios 24
The Nagios monitoring system is an open-source solution to monitor network and infrastructure resources.It provides a notification system for resources such as machines, switches, networks and so on.Nagios is able to alert administration if anything fails and it also notifies the system if an issue has been detected or settled.For Nagios to be deployed, it requires complex manual configuration ( Mongkolluksamee, 2010;Issariyapat et al., 2012 ) and also scalability could not be seen as its strong point.Nagios uses many buffers, pipes and queues that could cause bottlenecks if the system were to monitor a large-scale cloud environment.In this context therefore, considerable modification would be necessary.As a consequence, it may not be suitable, as it stands, as an appropriate monitoring tool in dynamic environments such as edge computing frameworks.

OpenNebula 25
OpenNebula is a complete cloud management platform.The monitoring part monitors the cloud infrastructure to measure the state of resources, such as VMs and physical machines ( Gorbil et al., 2014 ).OpenNebula's monitoring tool collects individual metrics via several static sensing elements, called monitoring probes, which are running on the resources.The collection mode can use a push or pull method.In push mode, the initiator is the monitoring probe.The monitoring probe can easily send QoS information only when it detects that the changing metrics are greater than their thresholds.In pull mode, the monitoring manager repeatedly queries each probe.This approach is appropriate for keeping maximum consistency between the probes and the monitoring manager if we are dealing with a real-time environment involving a small number of nodes but it may not prove scalable for high update frequencies or large-scale infrastructures.OpenNebula enables users to customize and create simple monitoring probes which apply just the pull mechanism to send the measured information.OpenNebula offers the OneGate component to allow VMs to push monitoring data to the manager in order to collect applicationlevel parameters, recognize the problems in running applications.Note, however, that this module has been designed for only smallscale cloud environments.
OpenNebula is able to support VM migration and can be generally considered as a solution for dynamic environments such as VM-based use cases posed at the edge of network ( Minerva and Crespi, 2017 ).The OnLife Project ( Montero et al., 2017 ) using Open-Nebula proposes an edge computing design to dynamically migrate computing services closer to the users in order to ensure low latency interactions between IoT services and users.However, the preliminary barrier to adoption of OpenNebula monitoring part is cloud interoperability as the main field that needs to be addressed.

PCMONS
Private Cloud MONitoring System (PCMONS) ( Chaves et al., 2011 ) is a monitoring system aimed at meeting the need of opensource monitoring technologies for private clouds.The monitoring approach is compatible with the Eucalyptus IaaS platform.The main feature of PCMONS is the ability to be extensible in order to adapt to a new environment, such as incorporating new monitoring metrics.However, it has several disadvantages; as PCMONS is a Nagios module, it inherits Nagios performance and scalability issues that preclude applicability to huge cloud infrastructures; and it works only for monitoring infrastructure in private clouds.

DARGOS
DARGOS ( Povedano-Molina et al., 2013 ) is a decentralized resource monitoring solution specifically designed for cloud computing infrastructures.DARGOS uses Node Monitor Agents (NMA) and Node Supervisor Agents (NSA).The NMAs are responsible for gathering monitoring information from the VMs and forwarding it to the NSA.The NSAs collect measured information received from monitored resources and they are able to send monitoring data to the cloud administration.DARGOS should be running on the Open-Stack platform now.This is the reason why the OpenStack platform has been changed to support DARGOS (Nova project).Consequently, DARGOS is mainly confined to the cloud infrastructure it has been integrated with as it depends on the architecture of the OpenStack Nova implementation.Moreover, DARGOS is neither robust nor does it support live-migration of services that are neglected for edge computing frameworks.The set of metrics monitored by DARGOS is currently quite limited due to its early development status.

Lattice
The RESERVOIR 26 project ( Clayman et al., 2012 ) provides technologies by which services and resources can be efficiently provisioned, monitored, managed and migrated across federated clouds b However Ganglia is able to monitor built-in metrics (e.g.load average, CPU utilization, disk free, etc.), it is possible to extend Ganglia's metric library to measure application-specific metrics.
c DARGOS implements built-in application monitoring solutions to measure the status of Apache web server and MySQL database server.The application-related statistics to be monitored are predefined such as the number of requests per second and their uptime.However DARGOS developers claim that it is easily extensible to new monitoring environments, and to incorporate new metrics.
to maximize their exploitation and minimize their costs.RESER-VOIR introduces the Lattice ( Clayman et al., 2010a ) as a nonintrusive monitoring framework which can be integrated into many monitoring systems.Lattice is an open-source monitoring system natively oriented to infrastructure monitoring and mainly implemented for working in highly dynamic cloud environments including a large number of resources.However, RESERVOIR does not address the issue of directly providing monitoring information to cloud customers.The functionalities of Lattice are focused on the collection and distribution of monitoring data through either multicast or UDP protocol.Therefore, Lattice does not provide functionalities for visualization, evaluation and automated alerting ( Katsaros et al., 2011a ).
Lattice conceptually offers a design intended to enable application providers to build a monitoring solution fitting their own unique use case in terms of distribution frameworks such as edge computing.However, since a library of monitoring probes to be easily reused in the Lattice platform has not been provided so far, it is not highlighted as a monitoring system within edge computing frameworks.

JCatascopia
JCatascopia ( Trihinas et al., 2014 ), implemented in the Java programming language, is a monitoring tool which is able to monitor federated clouds operating on different cloud providers.It can retrieve heterogeneous monitoring data both at infrastructure level (e.g., memory and CPU) and at application level (e.g., service availability and throughput).JCatascopia is able to adaptively filter measured values of monitored metrics upon small delta differences compared to prior values, in order to reduce the storage and network overhead.Another interesting feature is the possibility to adapt the monitoring activity after machine migration.To this end, each message transmitted to the monitoring server contains the IP address of the monitored resource, so at each change the server is notified.JCatascopia is neither aimed at directly being able to compute cost assessment, nor is it capable of recognizing the topology of a deployed application.
In the SWITCH project, in order to develop a monitoring system, JCatascopia has been chosen as the baseline technology which was extended in this work to fulfil the requirements of containerized applications.Since JCatascopia is written in Java, each container instance which includes a monitoring probe requires some packages and a certain amount of memory for a Java Virtual Machine (JVM) even if the monitored application running alongside the monitoring probe in the container is not programmed by Java.Therefore, containerized monitoring probes in the SWITCH project have been implemented through StatsD protocol27 available for many programming languages such as C/C ++ .

Tower 4Clouds 28
Tower 4Clouds ( Miglierina and Nitto, 2016 ) which is a multicloud monitoring platform developed as a part of the MODA-CLouds project29 collects metrics at VM and application levels.Self-registering components named data collectors measure metrics and send the monitoring data to a central entity called data analyzer.Tower 4Clouds is a modular platform which allows application providers to build their own monitoring data collectors for custom metrics.It stores the monitoring data in InfluxDB or Graphite, performs analysis on the sorted data and shows results through Web-based user interface tools such as Grafana at runtime.This open-source monitoring platform continuously checks a set of monitoring rules which determine the application health and QoS constraints.If any rule becomes true, an associated action will be triggered such as notifying other components (e.g.selfadapter) by performing REST calls.These rules can be automatically generated through the MODAClouds Integrated Development Environment (IDE) or defined by application designers.This monitoring platform is able to filter monitoring data at various levels of abstraction, handle the heterogeneity of resources being monitored and also autonomously to deal with the scaling actions or live-migration of services from one VM to another one.It should be noted that as a data collector in Tower 4Clouds is written in Java, there is the same problem from which native JCatascopia is suffering: specific packages dependencies and a specific amount of memory are required for a JVM.
However the MODAClouds consortium planned to work towards having Docker container images as another method of obtaining a design-time platform to make cloud applications, they did not extend the Tower 4Clouds in order to be capable of container-level monitoring as well.On the other hand, it is a highly composable open-source platform that helps it to be extended or integrated with containerized and edge computing scenarios.

Cloud monitoring tools' support for functional requirements
In Table 8 , a list of the previously-mentioned cloud monitoring systems and their functional requirements in a general sense is provided.These features are investigated in order to find out an a Zenoss Service Dynamics is a monitoring platform able to perform multi-tenant operations.However, it is a commercial product.b Auto-discovery in Zenoss should be manually activated.This means it is not able to perform on a periodic basis automatically.c Ganglia lacks the capability of auto-discovery at inter-cluster level.d In order to activate auto-discovery, some configurations need to be manually managed in Zabbix server.e Nagios has enterprise monitoring solutions with auto-discovery that are not free.f OpenNebula has an approach called Hook Manager to trigger custom scripts for changes (e.g.SHUTDOWN, BOOTING, etc.) in state of VMs or hosts.However, the usage of hooks is mainly aimed at having more high availability and strategies on the infrastructure.Moreover, OpenNebula offers the OneFlow component to allow administrators to specify auto-scaling rules based on monitoring metrics.
g Automated alerting ability of JCatascopia has been provided.However, this part of the project is proprietary and not publically released.h Nagios is not suitable for monitoring environments that need a high rate of sampling ( Katsaros et al., 2011b;Paterson, 2010 ).
i The pull mode usable in OpenNebula is not usable for large-scale environments with high update frequencies.For example, having around 50 VMs, the proper monitoring period would be around 5 minutes.However, the push mode is more scalable for such large clouds.
j Fping ZenPack is able to measure latency and packet loss.k Fping ZenPack is a ping-like monitoring tool which uses just ICMP protocol to examine if a target node is responding.l In comparison with Zenoss, Zabbix is able to measure more network related parameters.m However Zabbix supports monitoring by TCP, SNMP and ICMP checks, as well as over IPMI, SSH, JMX, Telnet and also using custom protocols, it lacks automatic network traffic engineering and re-routing.
n There exist different Nagios plugins to monitor network download speed, bandwidth usage, network connections, packet loss, etc. o Zenoss Control Center which enables an application to run as a set of distributed services knows where they are running and how they are connected to each other.p A future objective for the Lattice monitoring tool is making the ability to dynamically reprogram the measurement and change the monitoring data source at runtime ( Clayman et al., 2010a ).appropriate base-line technology for the needs of monitoring applications deployed based on edge computing framework.Monitoring tools compared in this table are not aimed primarily at monitoring Docker containers; however, some of them have a module or extension to monitor Docker containers.Moreover, in details, Table 9 underlines which functional requirements in the taxonomy described earlier have been addressed by each of cloud monitoring systems from free and open-source software viewpoint.

Cloud monitoring tools' support for non-functional requirements
Table 10 presents the analysis of the essential non-functional requirements for the previously-mentioned cloud monitoring tools.The goal of the comparison is to specify and trade-off the strengths, drawbacks and challenges which have been encountered in the context of self-adaptive applications in terms of edge computing.a Ganglia daemon called gmond which is running on each monitored node adds an overhead because of both XML event encoding and multicast updates ( Arabnejad et al., 2017 ).
b In Nagios, there are numerous service checks which are resource intensive in terms of overhead incurred by notable CPU and IO usage.c Individual configurations in OpenNebula are needed to migrate VMs from one node to another.For example, if the hypervisor is KVM, libvirt's TCP has to be configured.Also a folder should be already shared on both source and destination nodes.
d Since PCMONS has been implemented as a module for the Nagios monitoring system and hence both principally act in the same direction.e As a monitoring probe is written in Java, it requires some packages and a certain amount of memory for a JVM.f As a data collector is written in Java, it needs some packages and a specific amount of memory for a JVM.

Conclusions and future outlook
Monitoring for adaptive edge computing applications has recently gained a wide range of attention in the context of a "future Internet", as a field that still needs to be fully scrutinized and improved.Since the self-adaptation of applications developed for edge computing frameworks is at an early stage of research and development, we, as the members of cloud community, believe this review paper would serve as an important reference for further research in this field.The paper also highlights several challenges and technical problems in existing cloud monitoring approaches for edge computing purposes, which could be considered to enhance the performance of such adaptive applications.
We have also derived a taxonomy for the main functional and non-functional requirements that cloud monitoring systems should address, and we have related contributions provided in literature so far.More importantly, this review paper has compared several widely used cloud monitoring tools, both open source and commercial, along with their capabilities and shortcomings, as well as how these monitoring systems meet varied requirements.This comprehensive comparison offers companies, which are providing services based on edge computing, an opportunity to gain insights of monitoring tools usable in different but interdependent virtualization layers.However, our comparison shows that an integrated solution that fully monitors all the layers of an edge computing scenario is currently unavailable.Such a solution would need to provide some abstraction layer so that there is at least some commonality in the way these different layers are monitored.Moreover, some requirements have not been really fully met by any of the existing cloud-based monitoring technologies.
Another interesting paradigm is the inclusion of more monitoring systems for container-based virtualization technologies able to provide a lightweight mechanism for initiating, deploying, scaling and moving services between infrastructures within edge computing frameworks.Container management systems such as Kubernetes or OpenShift Origin are lightweight platforms able to orchestrate containers and automatically provide horizontal scalability of applications.However, their native scaling approaches are principally based on CPU usage; no matter for example how workload intensity or application performance is behaving.Therefore, an interesting field of further studies would be the investigation of additional monitoring systems, preferably the ones that are capable of being integrated with container management systems and also published under the Apache 2 license, for easier customization and integration possibilities into other systems.

Table 1
Overview of research on different VM-level monitoring systems.

Table 2
Common set of container-level parameters.

Table 3
Overview of container-specific monitoring tools.

Table 4
Overview of research on different container-level monitoring systems.

Table 5
Overview of research on different link quality monitoring systems.

•
Communications between/within cloud datacenters: with the remarkable growth in cloud-based applications, the volume of data exchanged between software components in different tiers deployed on cloud datacenters is rapidly increasing.Ensuring that these types of applications are able to offer favorable service quality has been a challenging issue due to runtime variations in network conditions intrinsic to connections between individually replicated and distributed application components across/within different cloud datacenters.
• Communications between IoT object/user and edge nodes: selfadaptive application providers have to dynamically adapt their services to the IoT device and customer's network circumstances to provide high performance and a seamless experience.

Table 6
Overview of research on application-level monitoring systems.
conferencing services by creating network connections among Amazon's cloud datacenters, making use of the low latency and reduced packet loss that exists among its datacenters.The authors assess different network QoS metrics of connections between Amazon datacenters in different regions that provide better network quality than the average Internet connection.Their implementation deals with trans-continental communications and consequently does not address traffic localization including data exchanges between nearby peers just in one area.Since in some cases such as Pokémon GO, (one of 2016s successful games) edge nodes distributed around a country (e.g.Australia) would occasionally need to send data (only a subset of scoring information) up to a central datacenter whereas the game requires constant back-andforth interactions between the users and the edge nodes in close proximity.

•
Provide the functionality for visualization • Able to filter measured values to diminish data exchanges • Able to be tuned to any desired monitoring time interval • Include capability of long-term storing of measured values • Support scaling adaptation policies for large-scale dynamic environments • Able to set up automated alerts • Support different adaptation actions e.g.service migration, etc. • Automatic installation and configuration of monitoring system • Able to be customized based on monitoring needs Both VM level and container level • Independent from underlying cloud infrastructure provider • Quickly react to the dynamic resource management changes over time • Support monitoring of all types of hardware virtualizations • Offer an API to expose monitoring data End-to-end link quality level • Investigate the whole range of end-to-end network QoS properties • Support on-demand network configuration • Able to reach the device in spite of filters and firewalls • Consider user's link conditions (e.g.network quality) • Able to deal with the application topology and reconfiguration • Define effective measurement interval upon the application performance • Support multi-tier applications • Able to be adapted with time-varying application adaptation goals Non-functional requirements

capability of long-term storing of measured values:
• Include

different adaptation actions e.g. service migration, etc
• Support .: Within edge computing frameworks, monitoring systems should deal with uncertainties imposed by application recontextualization (e.g.dynamic IP address management) due to adaptation actions such as VM or container live-migration.

Table 8
High-level analysis of functional requirements for cloud monitoring tools.Zenoss has developed the Fping ZenPack as an extension to monitor network connections. a

Table 9
Detailed analysis of functional requirements for cloud monitoring tools.

Table 10
Non-functional requirement analysis for cloud monitoring systems.Comparison in this table is upon the reviewed literature and based on conducting experiments with the tools.