The Do’s and Don’ts of Infrastructure Code: a Systematic Grey Literature Review

Context. Infrastructure-as-code (IaC) is the DevOps tactic of managing and provisioning software infrastructures through machine-readable definition files, rather than manual hardware configuration or interactive configuration tools. Objective. From a maintenance and evolution perspective, the topic has picked the interest of practitioners and academics alike, given the relative scarcity of supporting patterns and practices in the academic literature. At the same time, a considerable amount of grey literature exists on IaC. Thus we aim to characterize IaC and compile a catalog of best and bad practices for widely used IaC languages, all using grey literature materials. Method. In this paper, we systematically analyze the industrial grey literature on IaC, such as blog posts, tutorials, white papers using qualitative analysis techniques. Results. We proposed a definition for IaC and distilled a broad catalog summarized in a taxonomy consisting of 10 and 4 primary categories for best practices and bad practices, respectively, both language-agnostic and language-specific Email addresses: i.p.k.weerasingha.dewage@tue.nl, m.garriga@uvt.nl, angeluromeu88@gmail.com, d.dinucci@uvt.nl, d.a.tamburri@tue.nl, w.j.a.m.vdnheuvel@uvt.nl (Indika Kumara, Martín Garriga, Angel Urbano Romeu, Dario Di Nucci, Damian Andrew Tamburri, Willem-Jan van den Heuvel), fpalomba@unisa.it (Fabio Palomba) Preprint submitted to Information and Software Technology Journal April 3, 2021 ones, for three IaC languages, namely Ansible, Puppet, and Chef. The practices reflect implementation issues, design issues, and the violation of/adherence to the essential principles of IaC. Conclusion. Our findings reveal critical insights concerning the top languages as well as the best practices adopted by practitioners to address (some of) those challenges. We evidence that the field of development and maintenance IaC is in its infancy and deserves further attention.


Introduction
The current information technology (IT) market is increasingly focused on "need for speed": speed in deployment, faster release-cycles, speed in recovery, and more. This need is reflected in DevOps, a family of techniques that shorten the software development cycle and intermix software development activities with IT operations [1,2]. As part of the DevOps, infrastructure-as-code (IaC) [3] promotes managing the knowledge and experience inside reusable scripts of infrastructure code, instead of traditionally reserving it for the manual-intensive labor of system administrators, which is typically slow, time-consuming, effortprone, and often even error-prone.
IaC represents a widely adopted practice [3,4,5]. However, little is known concerning its code maintenance, evolution, and continuous improvement in academic literature, despite increasing traction in most if not all domains of society and industry: from Network-Function Virtualisation (NFV) [6] to Software-Defined Everything [7] and more [8].
However, little academic literature exists on infrastructure code since research on IaC is still in its early phases. At the same time, companies are working dayby-day on their infrastructure automation, as also witnessed by the considerable amount of grey literature on the topic. Hence, we can observe a gap between academic research and industry practices, mainly to figure out the technical, operational, and theoretical underpins and the best and bad practices when developing IaC in the most popular languages.
This paper aims at addressing this gap with a systematic grey literature review. We shed light on the state of the practice in the adoption of IaC by analyzing 67 high-quality sources and the fundamental software engineering challenges in the field. In particular, we investigate: (1) how the industrial researchers and practitioners characterize IaC, and (2) the best/bad practices during general (i.e., language-agnostic) and language-specific (e.g., Puppet, Chef, Ansible) IaC development.
Specifically, we derived a more rigorous definition for infrastructure code. We identified a taxonomy consisting of 10 primary categories for best practices and 4 for bad practices, with all practices reflecting key improvements for DevOps processes around IaC. The practices reflect a range of scenarios: (a) implementation issues (e.g., naming convention, style, formatting, and indentation), (b) design issues (e.g., design modularity, reusability, and customizability of the code units of the different languages); (c) violation of/adherence to the essential principles of IaC (idempotence of configuration code, separation of configuration code from configuration data, and infrastructure/configuration management as software development). Overall, these issues highlight that the field of development and maintenance of IaC deserves further attention and further experimentally-proven tooling.
The rest of this paper is organized as follows. Section 2 provides a background on IaC. Section 3 poses our motivation and problem statement based on related work in the field. The research design and the research questions are presented in Section 4, while the results are provided in Section 5. In particular, Section 5.1 summarizes the selected sources; Section 5.2 focuses on IaC definition, classification, and features; Section 5.3 and Section 5.4 present best and bad practices of IaC development. We discuss the implications of our findings in Section 6 and the threats to validity in Section 7. Finally, Section 8 concludes the paper.  Infrastructure-as-code (IaC) is a process of managing and provisioning computing environments in which software systems will be deployed and managed, through the reusable scripts of infrastructure code [3]. In this section, we briefly introduce the three IaC technologies considered in this paper: Ansible [9], Puppet [10], and Chef [11]. We consider these IaC technologies because they are the most popular languages amongst practitioners according to our previous survey [12]. Below we present only a subset of the constructs of each IaC language, and will introduce the other constructs when we discuss about the best and bad practices of each language.

Ansible
In Ansible, a playbook defines an IT infrastructure automation workflow as a set of ordered tasks over one or more inventories consisting of managed infrastructure nodes. A module represents a unit of code that a task invokes.
A module serves a specific purpose, for example, creating a MySQL database and installing an Apache webserver. A role can be used to group a cohesive set of tasks and resources that together accomplish a specific goal, for example, installing and configuring MySQL. as Install httpd and Create database. Each task uses a module to achieve its objective, for example, the task Install httpd uses the module yum. The inventory file defines the webserver node webs1 and the database server node dbs1. The playbook applies the two roles to the two nodes.

Puppet
In  package to install the Apache web server. The node webs1 declares this class to add its resources to the node. while the dbs1 node creates a MySQL database employing the resource mysql::db.

Chef
In Chef, a cookbook represents an IT automation workflow. It consists of a set of recipes, which are a collection of resources to be created and managed on a node. A resource declares a system component and the actions to create and manage the component, for example, installing the package Apache. Chef recipes are written using Ruby. Figure 1(c) shows a Chef recipe with two resources. The resource package is used to install Apache web server on the node webs1, and the resource mysql_database is applied to create a MySQL database on the node dbs1.

Related Work and Research Goals
In this section, we discuss the related work and set forth our research goals.

Prior Research on IaC
Rahman et al. [13] recently performed a mapping study on IaC research and classified the existing work into several categories, which we summarize by providing an overview on framework and tools, empirical studies, and antipattern catalogs for IaC. Finally, we provide a summary of previous literature reviews on IaC-close topics.

Framework and Tool for IaC
The mapping study reported several tools/frameworks that extend the func-  [26] and replicated previous studies [27]. Finally, Opdebeeck et al. [28] analyzed the adoption of semantic versioning in Ansible roles, while Kokuryo et al. [29] examined the usage of imperative modules in the same language.

Antipattern and Practices Catalogs for IaC
No previous systematic literature analyzed good and bad practices that developers adopt when implementing IaC. Although no studies defined and characterize the concept of IaC using gray literature, some previous work [30,31] leverage grey literature to compile code smells catalogs for different languages.
In particular, they relied on language style guides (e.g., Puppet style guide) and smell detection rules implemented in linters (e.g., Puppet-Lint) Sharma et al. [30] developed a catalog of design and implementation smells for Puppet. These are well-known violations of best practices for configuration code identified by  [25] and ten Kubernetes-specific security practices [24]. Guerriero et al. [12] identified four best practices and seven bad practices from their survey with practitioners.

Previous Literature Reviews concerning IaC-related topics
Previous grey and white literature surveys analyzed topics related to Infrastructure as Code such as DevOps tools [36], cloud resource orchestration techniques [37], and cloud deployment modeling tools [38]. These surveys describe the general capabilities of IaC tools and their usage scenarios. However, there is little or no information about the best and bad practices of using such tools.

Research goals
Given the work above, it is clear that best and bad practices regarding IaC should be derived from grey literature. Therefore, our goal is to analyze, assess, and summarize such literature sources systematically. We aim at compiling a comprehensive catalog of best and bad practices that developers should follow when developing and maintaining IaC projects. Please note that our scope is not limited to source code but compass the analysis of the different aspects of IaC projects by reporting the practices for three widely used languages, namely Ansible, Chef, and Puppet. Finally, we aim to define and characterize the concept of IaC comprehensively.

Research Methodology
We build our research design upon the SLR guidelines proposed in systematic literature reviews in software engineering [39,40]. We also used the recent grey and multi-vocal SLRs [41,42,43,44] as a reference. Figure 2 shows the steps of our SGLR process.

Research Questions
We defined three research questions to achieve our research goals, as mentioned in Section 3.2.

Data Sources and Search Strategy
Similar to other multi-vocal and grey literature reviews [41,42,43,44], we employed the Google search engine to search the grey literature. We only consider textual sources such as reports, blog posts, white papers, and official documentation of each IaC language. We first identified the initial set of search terms on the research questions and created the following query:

Infrastructure as code (bug(s)|defect(s)|fault(s)|(anti-)pattern(s))
We obtained only a few sources; thus, we refined the generic terms of the query to include the language-specific keywords. In this study, we only considered three IaC languages such as Ansible, Puppet, and Chef. Two main reasons are driving this choice. On the one hand, these are the most popular IaC languages in the industry according to our previous survey [12]. On the other hand, the inclusion of a larger number of smaller and less popular languages would have resulted in a prohibitively expensive manual screening.
This resulted in the following query: (ansible|puppet|chef) ((anti-)pattern(s)|(best|good|bad) practices) We applied the above two queries on the Google search engine, scanning each resulting page until saturation (i.e., we stopped our search when no new relevant articles were emerging from search results) (as in [41,44]). We performed our search in an incognito-mode so that we avoided our personal search bias.  Table 1).

Eligibility Criteria and Study Selection
Inclusion/Exclusion Criteria. We considered an article for further analysis only when it satisfies all inclusion criteria and does not satisfy any of the exclusion criteria. We included: • Articles in English and the full text is accessible; • Articles matched the focus of the study, i.e., concepts and characteristics, best/good and bad practices, bugs/defects/anti-patterns, and challenges concerning IaC in general or a specific IaC language.
We excluded: • Articles not matching the focus of the study; • Articles restricted with a paywall; • All duplicate articles found from various sources; • Short elements that do not contain sufficient data for our study, such as posts in forums and comments; • Articles that do not provide scope, consequences, and examples of the proposed best/bad practices and bugs/defects/anti-patterns.
Quality assessment. In addition to the aforementioned inclusion/exclusion criteria, we applied the following criteria for further assessing the quality of the articles, which were adopted from the existing grey and multi-vocal literature review studies [41,45]: • Is the publishing organization reputable?
• Is an individual author associated with a reputable organization?
• Has the author published other work in the field?
• Does the author have expertise in the area?
• Does the source have a clearly stated purpose?
• Is the source recent enough (i.e., within the last three years)?
The validation was mainly carried out by two of the authors of this paper.
The authors distributed the material between them nearly equally, and they validated only their corresponding instances. In problematic cases, the authors mutually agreed on whether those specific documents should be considered.
Whenever they did not agree, they discussed with one or more other authors of the paper to resolve the disagreement. Initially, we considered only the sources that reached at least 3 of the above criteria, for a total of 50 sources. However, we could not answer some of the criteria/questions for the rest of the sources. For example, in some cases, there was no indication of the publishing organization or author. From the remaining sources, we selected another 15 relevant (given that they covered the topic of interest), albeit not covering those criteria. All in all, we proceeded with a final set of 67 selected sources.

Data Synthesis and Analysis
To attain the results for answering our research questions, we read, synthesized, and analyzed the above selected industrial studies following a qualitative analysis process [46]. In particular, to obtain codes, groupings, and categories related to the research questions systematically, we followed a series of steps: 1. Pilot study. The first set of 20 sources were randomly selected and analyzed independently by two researchers to establish an initial set of codes, using Structural and Descriptive Coding [47]. The codes were extracted by conceptualizing all the information stemming from the sources related to the Research Questions. This phase included a constant back and forth check of the codes, constantly refining them to sharpen the growing theory.

2.
Inter-rater assessment (pilot study). After the coding phase, the two researchers started an inter-rater assessment on the codes to appraise each other's codes and reach unanimity on their names, types, and categories.
This process led to the change of several codes, reaching uniformity and rigidness among them.
3. Full dataset coding. The rest of the sources were coded next, following the consensus reached by the two researchers in the previous step. The sources were split into two halves, each analyzed and coded by one of the researchers independently.

4.
Inter-rater assessment (full study). After the independent code extraction, the researchers looked into each other's codes to inter-rater assess the work and make sure everything was coded, as previously stated. All the discrepancies were solved via discussions.

5.
Grouping. The theory led us towards six overarching groups of codes: best practices, bad practices, IaC definition, IaC advantages, and IaC challenges. These groups were then analyzed and further decomposed as necessary. The best/bad practices were grouped into language-agnostic, Ansible, Chef, and Puppet.
6. Categorization of Practices. We categorized the identified atomic best and bad practices into coarse-grained categories. For the candidate practice categories, we used the practice categories proposed in the existing literature [48,49,50,51,52,53]. We also employed the categories proposed in the Common Weakness Enumeration (CWE) 1 , which was used by several studies on smell and bug taxonomies [54,33]. The first authors of this paper first compiled a list of candidate categories. Next, the first author and the fourth author independently assigned the atomic practices into a subset of the candidate categories. We addressed all the discrepancies through discussions. When needed, we modified some categories in the selected candidate categories to reflect the IaC context.

Replication Package
To enable further validation and replication of our study, we made the related data available online 2 . It contains the full list of sources, the qualitative analysis (codes, groupings, and analysis) performed with the Atlas.ti tool 3 , the keywords and phrases extracted to define IaC in RQ1, the atomic best/bad practices, and the final taxonomy derived for RQ2 and RQ3.

Analysis of the Results
In this section, we provide an overview of the selected grey literature, and answer each research question in detail. Table 2 shows provide an overview of the selected studies, including contribution type, content type, IaC language, publication year, and publication venue.

Overview of the Selected Grey Literature
The sources cover the articles from IT companies with high reputation (e.g., Microsoft and IBM), official communication channels of the IaC languages/tools (Ansible, Chef, Puppet), articles on online publishing platforms (e.g., DZone and Medium), and blogs.    the key topics (codes). In general, the IaC patterns have been discussed frequently as its counterpart. The selected literature has also reasonably considered the characteristics, advantages, and challenges of IaC.

RQ1: Definition Proposal for IaC
In this section, we use the definitions, classifications, and features of IaC found in the selected studies to provide an integrated definition for IaC. We gathered the definitions of IaC from six different sources in our grey literature and extracted the keywords and concepts from the corresponding text by manually scanning the sources. The extracted data is available in the replication package of this study. We identified three major dimensions to explain IaC: types of management • Software/Platforms are used to deploy, run, and manage applications, such as programming languages, frameworks, libraries, services, and tools.
IaC supports (1) defining the desired state of the software/platform (e.g., MySQL is installed with the root user), (2) installing, (re)configuring, and uninstalling the software/platform based on its definition.

Methods
IaC replaces the conventional processes used to managing a computing environment with a process that enables applying software engineering practices.
Instead of low-level shell scripting languages, the IaC process uses high-level

Properties of Managed Environments
From the selected literature, we can find the environment properties that enable IaC and those that IaC induces. Figure 5 depicts the frequency of codes for those properties.
• Virtualization enables on-demand provisioning of fundamental computing resources such as virtual machines and containers and is thus considered a prerequisite for IaC. Virtualization provides an additional layer of abstraction from provisioning and configuration. the infrastructure software-defined as their management functions are now codified and exposed as APIs.
• Consistency among multiple environments (development, test, production) can be achieved using IaC. Indeed, IaC eliminates the so-called environmen- • Auditability of the computing environment is the ability to track and trace the changes to the environment. As the changes to the environment are performed by changing the corresponding IaC source code, a versioncontrolled IaC provides a detailed audit trail for changes.
• Reproducibility of the environment indicates the degree to which a given environment can be easily, rapidly, and consistently recreated. With IaC, a given version of an environment can be provisioned using the same version of the environment definition model stored in the source code repository.

An Integrated Definition of IaC
Infrastructure-as-Code (IaC) is a model for provisioning and managing a computing environment using the explicit definition of the desired state of the environment in source code and applying software engineering principles, methodologies, and tools. IaC DSLs enable defining the environment state as a software program, and IaC tools enable managing the environment based on such programs.
The managed computing environment comprises three key types of computing resources (i.e., infrastructure, software/platform, and application) and exhibits six essential properties (i.e., Virtualized, Software-defined/Programmable, Immutable, Auditable, Consistent, and Reproducible). The management of the environment encompasses the management of the lifecycles of each computing resource in the management.

IaC Definition
The term IaC can be comprehensively analyzed and described using three major dimensions: (1) the three types of management operations supported by IaC, (2) the methods for implementing such management operations with IaC, and (3) the six desired properties of the managed environment.

RQ2: IaC Best Practices
In this section, we present IaC best practices extracted from the selected literature through the mixed-methods approach. Table 3 shows the categories of the IaC best practices described in the selected studies, extracted using the qualitative analysis. Each best practice category consists of several sub-categories.
For each sub-category, the sources, the number of related codes (frequency of codes) in sources, and the number of atomic practices are also shown. The code frequency of a sub-category is constituted by adding the code frequency of its atomic practices.

Practice 2 -Do not repeat yourself (or others)
This category relates to the IaC best practices that aim to reduce code clones and increase the reuse of IaC programs and tools. Reuse the tools that the community use (2d). IaC practitioners recommend using the tools and deployment patterns developed by the respective communities. For example, Chef recommends specific tools and systems for testing and sharing the cookbooks, generating data bags, and managing dependencies between cookbooks.

Modularize
Similarly, there are deployment architectures and development tools approved by the Ansible and Puppet communities. Package applications for deployment (3b). Furthermore, an application needs to be packaged to ease its deployment and execution, for example, .war file for a web application or a Docker image for a web server. Packaging can reduce the amount of code needed in later stages of configuration management.

Practice 3 -Let the IaC tools do the work
Do not violate immutability and reproducibility of your infrastructure (3c). As discussed in Section 5.2, infrastructure immutability simplifies management and improves predictability. The Environment Template (e.g., Docker and Docker compose files) simplifies cloning, sharing, and versioning environments.

Practice 4 -Make incremental changes
This category relates to the practices that aim to ensure managed changes to the infrastructure configurations.
Use a version control system (4a Chef, a Role is a logical way to group nodes (e.g., web servers and databases).
Each role can include zero (or more) attributes and a run-list of recipes. Roles are not versionable. Therefore, they should not be used to keep the recipes run-lists, which should be assigned as default recipes of cookbooks.

Practice 5 -Prevent avoidable mistakes
This category includes the practices that can be used to prevent introducing faulty behaviors to IaC codes.
Use the correct quoting style (5a). The improper usage of quotes can also introduce undesired errors such as erroneous interpolation of values. Thus each IaC language provides the best practices for using single quotes and doublequotes.
Avoid unexpected behaviors whenever possible (5b). Some best practices aim at preventing the introduction of unexpected problematic behaviors. Do not ignore errors (6b). The errors during task execution should not be ignored but handled with the error handling capabilities of Ansible to ensure that the execution of a task leaves the node/infrastructure in the desired state.
The states of some resources (e.g., services or daemons) can also be checked explicitly to verify the task execution.
Use off-the-shelf testing libraries (6c for collecting metrics about the managed environment.

Practice 7 -Document little but well
This category includes the guidelines for appropriately document IaC programs.
Code as documentation (7a). Source code should be treated as documentation.
This practice ensures that documentation is always current because it is part of the code and stored in a central repository. The inclusion of additional configuration instructions for users in a manual is discouraged as it can lead to infrastructure-documentation inconsistencies and non-reproducible environments.
Use document templates (7b). Templates and tools can be used to produce consistent documents. For example, as the roles are created to be shared across different projects typically using Ansible Galaxy, the recommendation is to document the roles consistently and sufficiently by using the template generated by Ansible Galaxy.

Practice 8 -Organize repositories well
This category of the best practices focuses on the proper organization of the IaC code repositories to foster improved and secured collaboration among users of repositories and to improve the understandability and predictability of the repository.
Modularize repositories (8a). The general recommendation is to use a single versioned-control IaC code repository (per organization) separated from the application source code repository. Chef considers cookbooks as standalone, self-contained applications, and thus a separate repository for each cookbook is desired. In Puppet, a control repository is a version-controlled repository that stores code, data, and modules or references to locations in other repositories.
Different repositories, per each artifact, can allow separate access and development cycles of artifacts, such as complex Puppet modules, global data about the organization, and Puppet profiles.
Use standard folder structures (8b). Each IaC language recommends a specific directory layout while allowing some variations for their IaC projects. The recommended structure for the top-level directory of an Ansible project consists of inventory directories, variable (group and host) directories, custom module and plugin directories, master (top-level) playbook rile, role playbook files, and role directories. A Chef project includes three main sub-directories: cookbooks, data_bags, and policyfiles (i.e., groups of cookbooks and settings for specific systems). The recommended structure for a Puppet module includes data, files, functions, hiera.yaml, lib, manifests, metadata.json, plans, and tasks.

Practice 9 -Separate configuration data from code
This category concerns the practices that aim to improve the management of configuration data.
Use configuration datasource (9a). When the number of the managed components exceeds a certain amount, the recommendation is to use a separate storage system to keep configuration data about those components (e.g., user names and server IPs). Ansible and Chef use file systems and version-controlled repositories (i.e., inventory files and data bags), and Puppet has Hiera, a key-value data store.
The data configuration stores can also avoid hard-coding parameter defaults and private data in IaC programs.
Modularize configuration data (9b). The configuration data can also be modularized to improve their maintainability and usage. The environments such as test, development, and production exhibit differences in the number and configurations of resources, and thus environment-specific data are needed. In general, the same set of IaC code scripts deploys these environments. Building a reusable IaC code requires making the code configurable for many sites/environments.
Using a separate data source (e.g., Ansible inventory files) for each environment minimizes the errors that can occur due to mixing configuration data from different environments. Moreover, the host data can be grouped based on the hosts' roles (e.g., web server and database server) to improve the modularity of an inventory file. In Puppet, Hiera hierarchy can represent the hierarchical organization of an environment.
Select data sources wisely (9c). IaC languages provide different options to store configuration data, and there exist guidelines for making appropriate choices considering different trade-offs. In Ansible, an inventory file defines the nodes in one or more target environments. The number and the properties of the nodes can be static or can change at runtime, for example, in elastic environments such as public clouds. Thus, the best practices are related to the content organization for maintaining modularity and separation of concerns. Dynamic inventories can help to separate different environments in a dynamic and loosely-coupled way. In Chef, both data bags and environment files can store configuration data. Compared to data bags, accessing data from environment files incurs little or no overhead and thus is preferred over data bags for environment storing specific settings. While an environment file is a natural place to keep the server IP addresses, using service discovery tools is recommended due to the dynamic nature of server IP addresses. Recipes use conditionals on environment names or attributes to apply different resources in different environments. As the attributes can be overridden per-environment basis, they can implement environment-specific changes without the complexity added by conditionals.
Use configuration templates (9d). The configuration template pattern is the best practice for managing configuration files. It is recommended to externalize all template variables as input parameters (i.e., defining a public API) to write a self-contained, reusable template without directly referring to the attributes of either of the IaC codes using the template. The best practice is to create the parameterized reusable templates using variables and conditional logic to cater to variations in the site-specific configurations, such as using Apache for test and production environment.

Practice 10 -Write secure code
This category concerns secure data and software management.
Isolate secrets from code (10a). The key recommendation is to isolate secrets (sensitive information) such as passwords and private ssh keys from IaC code.
Then, the isolated secrets should be injected into the deployment workflow as necessary during its execution.
Protect your data at rest (10b). The sensitive data should be encrypted when they are stored. Each IaC language supports a Vault for isolating and storing secrets securely.
Use facts from trusted sources (10c). IaC programs may use the information or facts about the environments to make decisions such as classifying nodes and selecting suitable modules to manage nodes. Some IaC systems, for example, Puppet, provides the mechanisms to collect and access facts securely (i.e., $trusted fact array).
Use standard secure coding practices (10d). IaC developers also recommend secure coding practices such as secure logging and the principle of least privilege.
In Ansible, to prevent displaying or logging decrypted data, the tasks' output should be selectively logged. The executions of some tasks need special privileges like root user or some other user (e.g., installing software). Enforcing fine-grained access control at the levels of a task or a block of tasks instead of globally at playbook or role levels is also recommended. The best practices for writing Puppet tasks include following the secure programming practices related to a given task and use the task parameter meta-data to declare if it holds sensitive data explicitly.

IaC Best Practices
We identified 10 primary categories of IaC best practices, sub-categorized in 33 lower-level categories. The practices cover each of the key constructs/abstractions of IaC languages. They reflect both implementation issues (e.g., naming convention, style, formatting, and indentation) and design issues (e.g., design modularity, reusability, and customizability of the code units of the different languages).

RQ3: IaC Bad Practices
In this section, we present the IaC bad practices reported in the selected grey literature. Table 4 shows the categories and sub-categories of the IaC bad practices. For each sub-category, the sources, the number of related codes (frequency of codes) in sources, and the number of atomic practices are also shown. Violating idempotence (1b). The violation of the idempotence property and the gradual creation of configuration drifts due to ad-hoc changes not managed by IaC can result in non-reproducible environments. In Ansible, imperative modules such as command and shell that execute ad-hoc operating system commands can break idempotence. Thus, the execution of the tasks using such modules should be guarded using conditionals that check the state of the infrastructure and its components. In Puppet, imperative statements (such as the resource type exec to run ad-hoc OS commands) can break one of the key philosophies of the language: the declarative configuration model [30].
Using non-reproducible images and environments (1c). Images for application components are generally created by extending a base or foundation image. If the base images are crafted manually without explicit specifications (e.g., Docker files), and such images get lost, updating and using those base images becomes difficult. The environments can also become non-reproducible due to manual or external updates outside automated IaC workflows.

Practice 2 -Not writing IaC programs for people
This category consists of the bad development practices that reduce readability, understandability, and testability of IaC scripts. In Puppet, the data in the profile modules specify profile defaults. Keeping these data in the same repository of the code is discouraged. It makes it difficult to delegate the development of profiles and their defaults to different teams (e.g., profile team and module team).
Improper Version Control (3b). Forking a community module/library without wrapping it or copying and pasting code to implement the same application in multiple environments is not recommended. The use of non-versionable codes such as Chef roles is also discouraged as changes have global effects.

Practice 4 -Do not write secure code
This category concerns violations of security practices for data and software management.
Hard coding information (4a). . Postponing the separation of secrets from the code and hard-coding sensitive information makes the code less reusable, less customizable, and more vulnerable.
Not using built-in security tools and mechanisms correctly (4b). Puppet offers several security functions, such as secure fact access and a module for auto signing certificates. Not relying on this built-in support is a bad practice. However, the users should evaluate the security vulnerabilities of the selected tools or mechanisms before deciding to use them. For example, Chef provides data bags as a way to manage secrets within Chef. However, using data bags is discouraged as a single decryption key is used for all secret information, and that key is distributed to every node.

IaC Bad Practices
We identified 4 primary categories of IaC bad practices, sub-refined in 10 categories. While most of these practices concern design and implementation issues related to key constructs/abstractions of IaC languages, they also reflect the violations of the essential principles of IaC: idempotence of configuration code, separation of configuration code from configuration data, and infrastructure/configuration management as software development.

Discussion, Highlights, and Observations
This section compares our work with existing academic literature and provides some observations on performing systematic grey literature reviews.

Comparison with IaC Practices Reported in the Existing Academic Literature
As discussed in Section 3, several studies [30,31] on IaC smells have used best/bad practices as the sources of the smells. However, these studies considered only a subset of the atomic practices: 32 practices for Puppet [30], and a similar amount for Chef [31]. We also observed that the reported best/bad practices are related only to a subset of the constructs/concepts of a given IaC language.
For example, in Spinellis et al. [30], the practices concerning Puppet constructs roles, profiles, tasks, configuration files, and Hiera (configuration datastore) have not been used. We found 10 and 4 primary categories of IaC best and bad practices and considered each construct in three different IaC languages. Thus, our findings imply that new types of IaC smells are recognized, categorized, and detected.
Our survey with the industrial researchers and practitioners revealed seven IaC bad practices and four best practices [12]. Interestingly, all of those practices are also part of our catalog, albeit a few are only indirectly related. We found that little documentation is a best practice as IaC code should act as the documentation, which reduces the potential for occurring inconsistencies between code and documentation. However, that survey states that IaC code poor documentation is a bad practice.
Rahman et al. [33] identifies seven security smells for IaC by qualitatively analyzing IaC (Puppet) code scripts, which also reflect the insecure coding practices. Our catalog includes only two of those seven smells: admin by default and hard-coded secrets. Smells such as empty passwords, suspicious comments, and weak cryptography algorithms have not been reported in our selected industrial studies. However, we found new security best practices: (1) secure logging (not showing decrypted secrets in logs); (2) using facts (node properties) from a secure in-memory data store (a secure fact array); and (3) explicitly indicating if the value of a parameter contains sensitive information using parameter meta-data.
By qualitatively analyzing defect-related commits, Rahman et al. [34] com- defects. Interestingly, each of these bugs was indicated by at least one of the best/bad practices in our catalog. For example, the improper use of imperative commands/modules (e.g., shell and command modules in Ansible, and bash resource in Chef) can result in non-idempotent IaC code, and thus is discouraged.
Hence, the violation of best practices and the application of bad practices are potentially good early indicators of bugs.

Observations on Systematic Grey Literature Reviews
In this section, we observe the major difficulties and potentials of conducting systematic grey literature reviews.
Assessing quality of Grey Literature. White literature typically conforms to a pre-specified format, including abstract, keywords, introduction, methodology, results, evaluation, and page limitation. However, this statement does not hold for grey literature review, where there are different unique types of sources such as blogs, white papers, slides, and language guides. It is also worth highlighting that we observed that extracting data from slide decks is even more challenging because they usually lack details and the corresponding video or audio presentations.
Please consider that grey literature cannot rely on the criteria used to assess white literature sources (i.e., ranking, h-index, and acceptance rate with the relevant research community). We found that the additional content of grey sources such as comments, likes, dislikes, and sharing are good indicators of their quality. However, most sources in our study did not have sufficient auxiliary content. Therefore, we applied a set of quality assessment criteria that focus on the reputation of the venues, the expertise of authors, and the clarity of the content of the article to address these challenges of selecting quality grey articles (see Section 4.3).
Lessons learned from applying Natural Language Processing techniques. Applying NLP techniques for the automated analysis of grey literature can complement the manual qualitative analysis to strengthen its importance and relevance. We initially attempted to apply topic modeling and topological data analysis.
In topic modeling, the main idea is to extract the latent topics from the data in an automated and unsupervised way. We compared the obtained topics with the codes and groups resulted from qualitative analysis. Then, we established a mapping (groups and topics) that allowed us to confirm the qualitative analysis findings and further refine them.
Topological Data Analysis (TDA) presents the shape of the unstructured data through a topological network. This technique provides a map of all the data set points: the closer the points, the closer their meanings. In our study, the points in the topology graph represent distributions of words. Simultaneously, the clusters are treated and interpreted separately to find their meaning and compare the topic modeling and manual qualitative analysis findings.
Most selected grey literature contained both text and code. In some cases, there was no clear separation between text and code using a special notation, preventing programs from differentiating them correctly. Moreover, many articles contain different content types, such as practices, features, examples, and concepts. Thus, it was problematic to separate the content belonging to different topics pragmatically accurately. These deficiencies in the pre-processing of grey literature resulted in low-quality data and less meaningful and broad topics.
Thus, we decided to rely on manual qualitative analysis solely. We claim that more research is needed to derive methodologies and guidelines for NLP and data-driven techniques in systematic grey literature reviews.
Investigation approaches for handling fast growing materials. As grey literature resources are generally published more often than white literature resources, we observe that methodologies and tools for coping with fast-growing grey literature would be valuable. For example, a tool to (semi-)automatically update the literature review results or predict their relevance. Such tools may also have implications on how the review results are recorded. For example, a taxonomy could also be codified as an ontology [55], which can be shared, reused, and semi-automatically updated.

Threats to Validity
In this section, we outline the threats to validity that may apply to our study.

Threats to External Validity
External validity concerns the applicability of a set of results in a more general context [46]. Since our primary studies are obtained from many online sources, our results and observations may be only partially applicable to the broad area of practices and general disciplines of IaC, hence threatening external validity.
There is a risk of having missed relevant grey literature because concepts related to those included in our search strings are differently named in such studies. Some studies may refer to patterns, anti-patterns, or smells instead of best/bad practices. To mitigate this, we have explicitly included all relevant synonyms and similar words in our search strings. We have also exploited the features offered by search engines, which naturally support considering related terms for all those contained in a search string. Items found using the search terms have been assessed thoroughly based on various dimensions of quality [56].
Finally, while there are many IaC tools, we could study only three IaC tools due to practicality. To partially mitigate this threat, we selected the three most relevant tools that the practitioners currently use. We recognize that a comparison between the three selected tools and other home-grown solutions would lead to additional insights. We leave, however, such an analysis for future research work.

Threats to Construct and Internal Validity
Construct validity concern the generalizability of the constructs under study, while internal validity concerns the validity of the methods employed to study and analyze data (e.g., the types of bias involved) [46].
We organized at least four feedback sessions during our systematic analysis.
We analyzed the discussion following-up from each feedback session, and we exploited this qualitative data to fine-tune both our research methods and the applicability of our findings. We also prepared an online appendix containing all artifacts we produced during our analysis, including the full list of sources, codes, groups, and distilled best/bad practices (see Section 4). We are confident that this can help make our results and observations more explicit and applicable in practice.
Furthermore, we adopted various triangulation rounds, inter-rater reliability assessment, and quality control factors (recall Section 4). We applied inter-rater reliability assessment in at least two phases, both for the pilot study and the final study, with the full set of sources: primary sources selection and coding process.
We added new studies and codes in the respective stages -although without performing a further inter-rater assessment. All in all, the risk of observer bias is always present when using this method.

Threats to Conclusions Validity
Threats to conclusions validity concern the degree to which the study conclusions are reasonably based on the available data [46].
To mitigate this threat, we again exploited theme coding and inter-rater reliability assessment to limit observer bias and interpretation bias, with the ultimate goal of performing a sound analysis of the data we retrieved. Additionally, the conclusions drawn in this article were independently drawn by the different authors. They were then double-checked against the selected industrial studies and/or related studies in one of our feedback rounds.
Overall, we know that our empirical investigation is limited to analyzing the practitioners' perception distilled from the grey literature. These findings, however, are in line with the ones stemming from focus groups with DevOps [12], and we are currently working to complement those studies with a large-scale mining software repository [57] (GitHub) investigation of how DevOps treat infrastructure code.

Conclusions
DevOps is a family of tactics that accelerate the deployment and delivery of large-scale applications. DevOps automation is driven by infrastructure code: the series of blueprints laying out the application infrastructure, its dependencies, and middleware across a DevOps pipeline.
This paper investigated infrastructure code language/tools and best/bad practices from a practitioner perspective by addressing grey literature in the field, stemming from 67 selected sources, and systematically applying qualitative analysis. We distilled a taxonomy consisting of 10 and 4 categories of best and bad IaC practices, respectively. We believe that this catalog, along with the categorization we provide, can be valuable for both practitioners and researchers.
The former can benefit from comprehensive guidelines of "do's and don'ts" when developing IaC scripts. The latter can find foundations for further research e.g., on IaC patterns and anti-patterns/smells, which is part of our future work.
Our findings reveal critical insights concerning the top languages and the best practices adopted by practitioners to address (some of) those challenges.
Overall, the most direct conclusion stemming from our evidence is that the field of software maintenance, evolution, and security of IaC is in its infancy and deserves further attention. On the one hand, several best practices exist, but they mostly concern the complexities inherent within IaC. On the other hand, many challenges exist, such as conflicting best-practices, lack of testability, security/secrets management issues, and monitoring.
Our future research agenda is based on the main findings of our grey literature review. We plan to provide automated mechanisms to recommend when and how to apply best practices in IaC code and pinpoint the bad practices affecting it. Furthermore, we also plan to replicate our investigation by considering more