Smart Brix – A continuous evolution framework for Container application deployments

Container-based application deployments have received significant attention in recent years. Operating system virtualization based on containers as a mechanism to deploy and manage complex, large-scale software systems has become a popular mechanism for application deployment and operation. Packaging application components into self-contained artifacts has brought substantial flexibility to developers and operation teams alike. However, this flexibility comes at a price. Pracitioners need to respect numerous constraints ranging from security and compliance requirements, to specific regulatory conditions. Fulfilling these requirements is especially challenging in specialized domains with large numbers of stakeholders. Moreover, the rapidly growing number of container images to be managed due to the introduction of new or updated applications and respective components, leads to significant challenges for container management and adaptation. In this paper, we introduce Smart Brix, a framework for continuous evolution of container application deployments that tackles these challenges. Smart Brix integrates and unifies concepts of continuous integration, runtime monitoring, and operational analytics. Furthermore, it allows practitioners to define generic analytics and compensation pipelines composed of self-assembling processing components to autonomously validate and verify containers to be deployed. We illustrate the feasibility of our approach by evaluating our framework using a case study from the smart city domain. We show that Smart Brix is horizontally scalable and runtime of the implemented analysis and compensation pipelines scales linearly with the number of container application packages.


INTRODUCTION
In recent years, we have seen widespread uptake of operating system virtualization based on contain-29 ers (Soltesz et al., 2007) as a mechanism to deploy and manage complex, large-scale software systems. 30 Using containers, developers create self-contained images of application components along with all 31 dependencies that are then executed in isolation on top of a container runtime (e.g., Docker 1 , rkt 2 , or 32 Triton 3 ). By packaging application components into self-contained artifacts, developers can ensure that 33 the same artifact is consistently used throughout the complete software release process, from initial testing 34 to the final production deployment. This mechanism for application deployment has become especially  images, both base images for common Linux distributions (e.g., Ubuntu, CoreOS, CentOS, or Alpine) 48 to subsequently add custom functionality, as well as prepared application images that can be directly 49 used in a container deployment. Once uploaded to a repository, a container image is assigned a unique, 50 immutable identifier that can subsequently be used to deterministically deploy the exact same application 51 artifact throughout multiple deployment stages. By deploying each application component in its own 52 container 5 , practitioners can reliably execute multiple component versions on the same machine without 53 introducing conflicts, as each component is executed in an isolated container. 54 However, since each container image must contain every runtime dependency of the packaged 55 application component, each of these dependency sets must be maintained separately. This leads to 56 several challenges for practitioners. Over time, the number of active container images grows due to 57 the introduction of new applications, new application components, and updates to existing applications 58 and their components. This growing number of container images inherently leads to a fragmentation of 59 deployed runtime dependencies, making it difficult for operators to ensure that every deployed container 60 continues to adhere to all relevant security, compliance, and regulatory requirements. Whenever, for 61 instance, a severe vulnerability is found in a common runtime dependency, practitioners either have 62 to manually determine if any active container images are affected, or initiate a costly rebuild of all 63 active containers, irrespective of the actual occurrence of the vulnerability. We argue that practitioners 64 need a largely automated way to perform arbitrary analyses on all container images in their deployment 65 infrastructure. Furthermore, a mechanism is required that allows for the enactment of customizable 66 corrective actions on containers that fail to pass the performed analyses. Finally, in order to allow 67 practitioners to deal with the possibly large number of container images, the overall approach should be 68 able to adapt it's deployment to scale out horizontally. 69 In this paper, we present Smart Brix, a framework for continuous evolution of container applications. 70 Smart Brix integrates and unifies concepts of continuous integration, runtime monitoring, and operational 71 analytics systems. Practitioners are able to define generic analytics and compensation pipelines composed 72 of self-assembling processing components to autonomously validate and verify containers to be deployed. 73 The framework supports both, traditional mechanisms such as integration tests, as well as custom, business-74 relevant processes, e.g., to implement security or compliance checks. Smart Brix not only manages the 75 initial deployment of application containers, but is also designed to continuously monitor the complete 76 application deployment topology to allow for timely reactions to changes (e.g., in regulatory frameworks or 77 discovered application vulnerabilities). To enact such reactions to changes in the application environment, 78 developers define analytics and compensation pipelines that will autonomously mitigate problems if 79 possible, but are designed with an escalation mechanism that will eventually request human intervention 80 if automated implementation of a change is not possible. To illustrate the feasibility of our approach 81 we evaluate the Smart Brix framework using a case study from the smart city domain. We show that 82 the runtime of the implemented analysis and compensation pipelines scales linearly with the number of 83 analyzed application packages, and that it adds little overhead compared to container acquisition times.

84
The remainder of this paper is structured as follows. In Section 2 we present a motivating scenario 85 and relevant design goals for our framework. We present the Smart Brix framework in Section 3, along 86 with a detailed discussion of the framework components. In Section 4 we evaluate our approach using a 87 case study from the smart city domain. Related work is discussed in Section 6, followed by a conclusion 88 and outlook for further research in Section 7.  2015b), which is depicted in Fig. 1. This loop outlines a reactive system that enables stakeholders to make 94 informed decisions based on the models and analyses of interdisciplinary domain experts who in turn can 95 access the large amounts of data provided by smart cities. In URBEM, a network consists of experts in 96 the domains of energy, mobility, mathematics, building physics, sociology, as well as urban and regional 97 planning. URBEM aims to provide decision support for industry stakeholders to plan for the future of the 98 city of Vienna and represents a Distributed Analytical Environment (DAE) (Schleicher et al., 2015c).

99
The experts in this scenario rely on a multitude of different models and analytical approaches to 100 make informed decisions based on the massive amounts of data that are available about the city. In turn, 101 these models rely on a plethora of different tools and environments that lead to complex requirements 102 in terms of providing the right runtime environment for them to operate. The used tools range from 103 modern systems for data analytics and stream processing like Cassandra and Spark, to proprietary tools 104 developed by companies and research institutes with a large variance in specific versions and requirements 105 to run them. Additionally, these domains have to deal with a broad range of different stakeholders and 106 their specific security and compliance requirements. Models sometimes need to tailor their runtime 107 environment to specific technology stacks to ensure compliance or to be able to access the data they need.

108
Managing and satisfying all these requirements is a non-trivial task and a significant factor hindering 109 broader adoption. Therefore, this environment offers an optimal case for the advantages that come with 110 the use of container-based approaches. Operations teams that need to integrate these models no longer 111 need to be concerned with runtime specifics. Experts simply build containers that can be deployed in the 112 heterogenous infrastructures of participating stakeholders.

113
However, several challenges remain. In URBEM the team of experts with their plethora of different 114 models created over 250 different images that serve as the foundation for running containers. The models 115 in these containers are fueled by data from several different stakeholders in the scenario, ranging from 116 research institutions in the City of Vienna to industry stakeholders in the energy and mobility domain.

117
Each of them mandates a very distinct set of security and compliance requirements that need to be met in 118 order to run them. These requirements in turn are subject to frequent changes and the containers need to 119 be able to evolve along with them. Additionally, even though the container approach provides isolation 120 from the host system it is still vital to ensure that the containers themselves are not compromised. This 121 calls for means to check the systems running inside the container for known vulnerabilities, an issue 122 that is subject to heavy and fast-paced change, again requiring according evolution. A recent study 7 123 shows that in the case of Docker, depending on the version of the images, more than 70% of the images 124 show potential vulnerabilities, with over 25% of them being severe. This also begs the question of   Last but not least, these containers need to comply to certain non-functional requirements that arise from 131 the specific situations they are applied in. This calls for the ability to constantly check containers against 132 certain runtime metrics that need to be met in order to ensure that these systems are able to deliver their 133 excepted results within stakeholder-specific time and resource constraints.

134
All these factors lead to a complex environment that calls for an ability to easily adapt and evolve 135 containers to their ever-changing requirements. Specifically, we identify the following requirements in the  • The ability to mitigate issues and evolve these containers based on the the results from the previously 141 mentioned checks.

142
• An approach that is applicable in the context of operations management, while still enabling the 143 participation of experts both for checking as well as evolution.

144
• An approach that can be applied to existing deployments as well as utilized to test new ones.

146
In this section, we introduce the Smart Brix framework for continuos evolution of container-based deploy-147 ments, which addresses the previously introduced requirements. We start with a framework overview,  and Compensation Facet are managed as self-assembling components 8 , an approach we already success-156 fully applied in previous work . Each of these components follows the Command 157 Pattern (Gamma et al., 1994) and consists of multiple processors that are able to accept multiple inputs 158 and produce exactly one output. This functional approach enables a clean separation of concerns and 159 allows us to decompose complex problems into manageable units. 160 Fig. 3 illustrates an example of auto-assembly within the Analyzer facet. We see a set of processors, 161 where each processor is waiting for a specific type of input and clearly specifies the output it produces.

162
The processors use a message-oriented approach to exchange input and output data, where each output 163 and input is persistently available in the message queue and accessible by any processor. In this example 164 we perform an analysis of a custom-built Debian-based container that hosts the Apache HTTPD server.

165
There are two potential processors for the input Artifact, each of them able to handle a different container 166 format. Since in our example the Artifact is a Docker Container, only the Docker Analyzer reacts and 167 produces as output a Docker Image. In the next step there are two active processors, the Docker Base  Additionally, the components in the analyzer and compensation facets follow the principle of Confi-183 dence Elasticity, which means that a component or processor produces a result that is augmented with a 184 confidence value (c ∈ R, 0 ≤ c ≤ 1), with 0 representing no certainty and 1 representing absolute certainty 185 about the produced result. This allows for the specification of acceptable confidence intervals for the 186 framework, which augment the auto-assembly mechanism. The confidence intervals are provided as 187 optional configuration elements for the framework. In case the provided confidence thresholds are not 188 met, the framework follows an escalation model to find the next component or processor that is able to 189 provide results with higher confidence until it reaches the point where human interaction is necessary to 190 produce a satisfactory result (illustrated in Figure 4). Each processor (p i ) from the set of active processors 191 (P a ) provides a confidence value c i . We define the overall confidence value of all active processors (c a ) as 192 c a = ∏ p i ∈P a c i . The compensation stops when c a meets the specified confidence interval of the framework 193 or a processor represents a human interaction which has a confidence value of (c i = 1). Artifacts to the Dependency Manager that is responsible for storing them in the Dependency Repository.

276
The components in the Compensation Facet generate potential compensations for containers that have     To enable the auto-assembly mechanism for each processor within each component in the Analyzer     we deployed a cAdvisor 16 container on every instance to monitor the resource usage and performance 366 characteristics of the running containers. Fig. 6 shows an overview of the deployed evaluation setup.  in terms of cpu, memory or disk usage, which is why we used cAdvisor only as a monitor to rule out 385 overloading our infrastructure. We also did not utilize any storage backend for cAdvisor since this has 386 shown to be a significant overhead which in turn would have skewed our results. We first compared the overall runtime of our analyzers, specifically the difference for one instance vs 402 two instance deployments, the results are shown in Fig. 7. Based on the results we see that our approach 403 can be horizontally scaled over two nodes leading to a performance improvement of around 40%. The fact 404 that in our current evaluation setting we were not able to halve the overall runtime using two instances 405 stems from several factors. On the one hand, we have a certain overhead in terms of management and 406 coordination including the fact that we only deployed one manager and storage asset. On the other hand, 407 a lot of the runtime is caused by the acquisition time, which is clearly bound by network and bandwidth.

408
Since our infrastructure is equipped with just one 100 Mbit uplink that is shared by all cloud resources, 409 this is a clear bottleneck. We also see that the majority of wall clock time is spent for acquisition and that run, do not need to be pulled again, hence reducing the acquisition time. Finally, we demonstrate that the 415 average processing time of our framework is stable, which is shown in Fig. 8. We further notice a small 416 increase in average processing time for the 250 image set, which is caused by the fact that this set contains 417 more images with larger package numbers compared to the overall amount of images tested, resulting in a 418 slightly higher average processing time. As illustrated in Table 1 Overall our experiments showed that from the 150 images we were able to auto-compensate 34 images 437 by reducing the number of vulnerabilities. This illustrates that even a rather simple strategy leads to a 438 significant improvement of around 22,6%, which makes this a very promising approach. In a next step, 439 we compared the overall runtime of our compensation handlers for the three tested sets, and the results are 440 shown in Fig. 9. We again can clearly see that the major amount of time is spent for acquisition, in this The rapid adoption of container-based execution environments for modern applications enables increased 479 flexibility and fast-paced evolution. Next to this fast-paced evolution of containers, new containers are 480 deployed whenever functionality has to be added, which leads to massive amounts of containers that need 481 to be maintained. While the container provides an abstraction on top of the operating system, it is still 482 vital that the underlying system complies to policies or regulations to avoid vulnerabilities. However, 483 checking the plethora of available environments and adapting them accordingly, is not a trivial task. that this approach is more efficient, more robust, and easier to implement as convergent approaches.

506
However, compared to our approach, the authors do not provide a framework for analyzing container 507 application deployments, which based on identified issues triggers according compensation mechanisms. workflows. By employing Docker containers, Skyport is able to address software deployment challenges 510 and deficiencies in resource utilization, which are inherent to existing platforms for executing scientific 511 workflows. In order to show the feasibility of their approach, the authors add Skyport as an extension to an 512 existing platform, and were able to reduce the complexities that arise when providing a suitable execution 513 environment for scientific workflows. In contrast to our approach the authors solely focus on introducing 514 a flexible execution environment, but do not provide a mechanism for continuously evolving container-  Although this work shares similarities with our approach, the authors do not provide a framework for 521 testing container-based deployments, which also supports semi-automatic compensation of found issues.

522
Next to scientific approaches, also several industrial platforms emerged that deal with the development 523 and management of container-based applications, with the most prominent being Tutum 20 and Tectonic 21 .

524
These cloud-based platforms allow building, deploying and managing dockerized applications. The numerous benefits of container-based solutions have led to a rapid adoption of this paradigm in recent 537 years. The ability to package application components into self-contained artifacts has brought substantial 538 flexibility to developers and operation teams alike. However, to enable this flexibility, practitioners need 539 to respect numerous dynamic security and compliance constraints, as well as manage the rapidly growing 540 number of container images. In order to stay on top of this complexity it is essential to provide means 541 to evolve these containers accordingly. In this paper we presented Smart Brix, a framework enabling 542 continuous evolution of container application deployments. We described the URBEM scenario as a 543 case study in the smart city context and provided a comprehensive description of its requirements in 544 terms of container evolution. We introduced Smart Brix to address these requirements, described its 545 architecture, and the proof of concept implementation. Smart Brix supports both, traditional continuous 546 integration processes such as integration tests, as well as custom, business-relevant processes, e.g., to 547 implement security, compliance, or other regulatory checks. Furthermore, Smart Brix not only enables 548 the initial management of application container deployments, but is also designed to continuously 549 monitor the complete application deployment topology and allows for timely reaction to changes (e.g., 550 discovered application vulnerabilities). This is achieved using analytics and compensation pipelines that 551 will autonomously detect and mitigate problems if possible, but are also designed with an escalation 552 mechanism that will eventually request human intervention if automated implementation of a change 553 is not possible. We evaluated our framework using a representative case study that clearly showed that 554 the framework is feasible and that we could provide an effective and efficient approach for container 555 evolution.

556
As part of our ongoing and future work, we will extend the presented framework to incorporate more 557 sophisticated checking and compensation mechanisms. We will integrate mechanisms from machine 558 learning, specifically focusing on unsupervised learning techniques as a potential vector to advance 559 the framework with autonomous capabilities. We also aim to integrate the Smart Brix framework with