MLOps: A Taxonomy and a Methodology

Over the past few decades, the substantial growth in enterprise-data availability and the advancements in Artificial Intelligence (AI) have allowed companies to solve real-world problems using Machine Learning (ML). ML Operations (MLOps) represents an effective strategy for bringing ML models from academic resources to useful tools for solving problems in the corporate world. The current literature on MLOps is still mostly disconnected and sporadic. In this work, we review the existing scientific literature and we propose a taxonomy for clustering research papers on MLOps. In addition, we present methodologies and operations aimed at defining a ML pipeline to simplify the release of ML applications in industry. The pipeline is based on ten steps: business problem understanding, data acquisition, ML methodology, ML training & testing, continuous integration, continuous delivery, continuous training, continuous monitoring, explainability, and sustainability. The scientific and business interest and the impact of MLOps have grown significantly over the past years. The definition of a clear and standardized methodology for conducting MLOps projects is the main contribution of this paper.


I. INTRODUCTION
In the last decades, Machine Learning (ML) has emerged as a powerful tool to solve complex real-world problems such as stock prediction [1], biomedical image analysis [2]- [4], autonomous driving [5], and fraud detection [6]. Since data availability has reached levels never seen before, businesses around the world are working to leverage these data and process them automatically exploiting the generalization power of ML, as to take actions and decisions [7].
In most real-world applications, data are constantly changing. This implies that ML models need to be retrained or, in the worst-case scenario, the entire ML pipeline has to be rebuilt to tackle feature drift [8], [9]. A more frequent, faster and simpler release cycle helps meet any regulatory or business changes. To achieve industrial growth, standardized production methods are required [10]- [12]. To industrialize ML models, a good set of production methods must be applied [13]. One of the key elements in facilitating the development of industry-leading companies is to improve communication between Science Technology Engineering Math (STEM) professionals and industry leaders or industry professionals by adopting a proven set of steps for industrializing ML solutions [14], [15].
Machine Learning Operations (MLOps) is a candidate to define these standardized production methods [16], [17]. MLOps can be viewed as the iterative process of pushing the latest best ML models to production [18], [19]. In fact, conducting a MLOps project means supporting automation, integration and monitoring at all stages of building a ML system, including training, integration, testing, release, deployment and infrastructure management [20], [21].
MLOps was born from different fields: ML, Development and Operations (DevOps) and data engineering (Fig. 1). Of the three fields, DevOps had the biggest impact on MLOps development. DevOps is a method of thought and practice that aims to improve and remove as much as possible the friction between development and operations (implementation and integration), seeing them as a single process [22]. The goal of DevOps is to study ways to improve service quality and features in order to meet customer needs [23], [24]. The primary links between MLOps and DevOps are the concepts of continuous integration (CI) and continuous delivery (CD), which allow software to be produced in short cycles, ensuring that it can be reliably released at any time.
When we look at how the current literature describes an ML project life-cycle, a picture as the one illustrated in Fig. 2 (top) is often shown. In many companies, model development and operations are carried out manually and without implementing MLOps. This slows down the industrialization of ML methodologies. As ML models have zero Return on Investment (ROI) until they can be used [25], [26], time to market should be the first metric to look at and optimize for any ML project. The only way to improve the release and continuous use of ML solutions in the industrial environment is to take great care of the part following the development of the model, in particular the interface between the ML solution and the existing Information and Communication Technologies (ICT) system. Infact, the most time-consuming step in releasing an ML solution into production is Operations ( Fig. 2, (bottom)). It is worth noting that, although MLOps foster process automation, its main goal is not to optimize business [27].

II. OBJECTIVES
The main objective of this paper is to provide a literature review on MLOps, with the goal to highlight current challenges in building and maintaining a ML system in a production environment [28]. At the same time, we aim at giving an overview on why MLOps were introduced to translate ML systems into production [29], [30]. To this end, we selected papers and projects in the field of MLOps and propose a first taxonomy to understand the work done so far. We identify key concepts by analyzing existing literature from 2015 to 2022. Finally, we propose our operational methodology to approach a ML project. As far as we know, this is the first effort to systematize the literature on this topic and provide its operationalization.
The main difference between the operational methodology proposed in this paper and the traditional workflow implemented in many ML projects, and shown in Fig.1, consists in the full integration of the various project steps in order to realize an effective, scalable, but above all industrializable solution. In fact, in most ML projects all forces are used for the development of an accurate ML model, without giving due importance to the integration and monitoring of the ML solution in the industrial environment. The main motivation for proposing a methodology is to try to normalize each step to bring ML models from research to production. Due to the growing interest, researchers are trying to figure out each step of MLOps without involving business partners in defining each step. This results in a misalignment in the definition of the MLOps issues without having a clear vision from the transition from research to production up to the maintenance of the models. By following a clear methodology, teams can have a deeper overview of all processes and organise each part of a project in a better and systematic way.
The rest of this paper is structured as follows: Section III reviews the related literature; Section IV presents the proposed MLOps workflow. Section V concludes the paper and suggests high-level directions for further research.

III. PROPOSED TAXONOMY
As introduced in I, MLOps initiatives aim to establish resilient and efficient workflows by creating robust pipelines [31], established practices [32] and auxiliary frameworks and tools. Indeed, model development is only a small part of the overall process, and many other processes, configurations and tools need to be integrated into the system [33]. Bringing the application of DevOps techniques in the context 2 VOLUME X, 20XX This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ of continuous training (CT), CD, CI and continuous monitoring (CM) is among the main requirements of a ML project that aims to provide process automation, governance and agility.
In the literature, several projects have tried to tackle various aspects of the ML production process by expanding existing libraries or by creating new tools to enhance the quality and performance of specific processes or make them more insightful. Up to now, there is no standardized and common pipeline to follow for an end-to-end MLOps project. To cluster the different approaches, we propose the following taxonomy: 1) ML-based software systems, also known as modelcentric frameworks. These systems focus on the architecture of ML models with a view to (CI/CD) [23], [34], [35]. The goal of such systems is twofold: on the one hand, it is to create and automate ML pipelines; on the other hand, the goal is to increase the level of automation in the ML software life-cycle [36]. 2) ML use case applications where, for example, papers explain an MLOps workflow to foster collaboration and negotiation between surgeon and patient [37], [38] or the ML pipeline on the Cloud for drug discovery [39]. 3) ML automation frameworks such as MLFlow [40], Kedro [41] or Amazon SageMaker [42], and benchmarking frameworks such as MLPerf [43], MLMod-elScope [44] and Deep500 [45]. These are interesting commercial tools that are already being used in daily work practice and represent excellent ML framework automation solutions. The following subsections review in more detail the works that fall into the three categories.

A. ML-BASED SOFTWARE SYSTEMS
Machine Learning is becoming the primary approach to solving real-world problems. Therefore, there are many data science teams studying how to apply DevOps principles to industries. The ML life-cycle involves manual steps for deploying the ML pipeline model. This method can produce unexpected results due to the dependency on data, preprocessing, model training, validation, and testing. The idea is to design an automated pipeline using two DevOps principles which are CI and CD. The functionality of CI is to test and validate data, data schemes and models. CD is for an ML pipeline that should automatically deploy another ML service [23]. The ML life-cycle has different methodologies to fit different scenarios and data types. The approach most used by data mining experts is CRoss-Industry Standard Process for Data Mining (CRISP-DM) [46], introduced in 1996 by Daimler Chrysler. Experts can borrow the standard CRISP-DM methodologies and try to apply them to the MLOps pipeline. The process typically involves two teams: ML scientists responsible for model training and testing, and ML engineers responsible for production and deployment. MLOps pipeline automation with CI/CD routines is as follows: • Business problem analysis; • Dataset features and storage; • ML analytical methodology; • Pipeline CI components; • Pipeline CD components; • Automated ML triggering; • Model registry storage; • Monitoring and performance; • Production ML service. One of the points of greatest attention after CI and CD is monitoring, in terms of metrics and Key Performance Indicators (KPIs), and continuous deploy models. This part includes model performance, data monitoring, outlier detection and explanations of historical predictions. Continuous monitoring is a process that allows to understand in realtime when validation performance tends to decrease. Outlier detection is the key to trusting and keeping the model healthy. Therefore, the most important function of continuos monitoring is to ensure high model performance and KPIs used to validate models. There are many metrics to test the quality of a model, such as precision, recall, F1, MSE. However, these metrics evaluate a model in the laboratory, regardless of the real-world context of how the model will be used. When evaluating ML models in the context of real applications, model performance metrics are not enough to establish the robustness of the models. The most basic step towards supporting such KPI-based analytics is to ensure that KPIs and model metrics are stored with a common correlation ID to identify which model operations contributed to transactions with a particular KPI score [36]. Other important KPIs at the company level for evaluating the performance of the model can be: time-to-market, infrastracture cost and scalability, profitability indices on sales (ROS) [47]. Unfortunately, ML models often fail to generalize outside the training data distribution [48].
Finally, the trust of the ML project are the model explanations. Explainability allows users to trust the prediction and this improves transparency. The user can verify which factors contributed to certain predictions, introducing a layer of accountability [35]. The terms "explainability" and "interpretability" are being used interchangeably throughout the literature; however, in the case of an AI-based system, explainability is more than interpretability in terms of importance, completeness, and fidelity of predictions or classifications [49]. Explainable Artificial Intelligence (XAI) is a research trend that promotes explainable decision-making. Many real-world ML applications greatly increase the efficiency of industrial production from automated equipment and production processes [50]. However, the use of "blackboxes" has not yet been overcome due to the lack of explainability and transparency of the models and decisions still present [51].

B. ML USE CASE APPLICATIONS
One of the most difficult challenges is using ML in real-world applications where the focus is on system integration and VOLUME X, 20XX scaling. The setup of MLOps use cases is continuous training, continuous integration and continuous deployment [52], where new versions of the ML system can be deployed in running software. In this section, we present a case study to understand what a workflow looks like in an MLOps project. The use case concerns Oravizio [38], a software product that provides data-driven information on patient-level risks related to hip and knee joint replacement surgery. Oravizio helps the collaboration and negotiation between the surgeon and a patient, so that the decisions that are taken are informed and there is consent to the operation.
Oravizio provides three different dedicated prediction models: • Risk of infection within one year from surgery; • Risk of revision within two years from surgery; • Risk of death within two years from surgery. In the case of Oravizio, data were collected over the years, including 30, 000 medical records, from patients who have undergone surgery. Since the number of cases is so large that no surgeon can process them manually during the appointment, these data have been used to create a risk calculation model that predicts the outcome of the surgery. The various formats of the data were one of the issues during preprocessing to create a standard for later analysis [37]. Once the data are standardized, an ML model can be created for each risk to enable validation and ensure regulatory compliance. The models selected to be trained for this task were Logistic regression, Random forest, XGBoost, Weibull/Cox survival mode. According to the results, gradient boosting with XGBoost produced the best performance and can be selected for use in production [38].
As shown in Fig. 3, these models are usually re-trained during the life-cycle of an ML product. We have new data and this entails continuous training to improve accuracy. We also have continuous delivery in terms of deploying new models and continuous monitoring, which has two faces: some indexes to track accuracy for data science analysis, and some KPI or different indexes from the business or clinical side to help understand the model and whether this approach can improve the business.
Unfortunately, there are no other use cases available in the literature that have a clear pipeline of MLOps where it is clearly explained the process from problem understanding to model deployment and continuous training, delivery and monitoring. For example, in the case of the Uffizi Gallery in Florence [34], one of the most visited museums in Italy with over 2 million visitors, the project aims to reduce the queue using ML but we do not have a clear set-up of the MLOps workflow. In the article in question, the authors talk about the chosen architecture, the reason why it was decided to use an ML algorithm, the run-time continuous training of the algorithm to improve performance, but what is missing is a methodological guideline.

C. ML AUTOMATION FRAMEWORKS
To have a business impact, ML applications need to be deployed in production, which means both deploying a model in a way that can be used for inference (e.g., REpresentational state transfer (REST) services) and deploying scheduled jobs to update the model regularly. This is especially challenging when deployment requires collaboration with another team, such as application engineers who are not ML experts, or when the ML team uses different libraries or frameworks [40]. ML projects have created new challenges that are not present in traditional software development. One of these includes tracking input data, data versions, tuning parameters, and so on, to keep production deployment up-to-date [53]. In this section, we want to summarize these challenges and describe some of the most popular ML frameworks like MLflow, Kubeflow, MLPerf, etc. [54].
MLOps frameworks can be divided into three main areas [55] dealing with:

1) Data Management
Data labeling tools (Table 1) are used to help the data science team to label large datasets such as texts, images, etc. [56], [57]. Labeled data are used to train supervised ML algorithms. We provide an overview of some data labeling tools and advantages and disadvantages in Table 2.
Data versioning tools (Table 3), on the other hand, are used by data science and data engineering teams to manage different versions of models and datasets [58]. This helps data science teams gain insights, such as identifying how data changes impact model performance and understanding how datasets evolve. An overview of some popular data versioning tools along with pros and cons are shown in Table 4.

2) Modelling
In Table 5 and 6, we present feature engineering tools that allow to add automation to the process of extracting useful   features from raw datasets to create better training data [59]. These tools help speed up the process of feature engineering and extraction and create better training data for ML models.
Developing ML projects involves running multiple experiments with different models, model parameters, or training data. Experiment tracking tools save all necessary information about different experiments [60]. This allows to track the versions of experiment components and results, and allows for comparison between different experiments. Some examples of experiment tracking tools are shown in Table 7. In Table 8 are presented a brief summary of their pros and cons.
Hyperparameters are the main part to get better models. These are the parameters of the ML training algorithms such as the learning rate, the type of regularization applied, and so on. Hyperparameter tuning tools help automate the process of searching and selecting the optimal hyperparameters that perform better [61], [62]. Popular hyperparameter tuning tools are shown in Table 9 and 10.

3) Operationalization
ML model deployment tools facilitate the integration and deployment of ML models into production [63]. Some tools with advantages and disadvantages for each software are shown in Table 11 and Table 12. ML model monitoring is another important part of a suc-VOLUME X, 20XX   cessful ML project because ML model performance tends to decay after model deployment due to changes in the input data stream over time [64], [65]. Model monitoring tools detect data drift and anomalies over time and allow to set alerts in case of performance issues. An overview of some popular data monitoring tools is provided in Table 13 and in  Table 14, with advantages and disadvantages.
There are also tools that cover the end-to-end ML lifecycle [66]. Some popular platforms are shown in Table 15 and in Table 16 with advantages and disadvantages of the main software

IV. PROPOSED MACHINE LEARNING OPERATIONS METHODOLOGIES
In this section, we provide our methodology for an MLOps project that aims to unify the lessons learned from the literature review into a single framework. The main difference from the other frameworks is that we are trying to create a new standard for ML projects inspired by CRISP-DM that helps strengthen the link between research and industries. Below, the different stages of the proposed MLOps process are described. Figure 4 provides a schematic overview.

A. BUSINESS PROBLEM UNDERSTANDING
Establishing a business understanding and the success criteria for solving the problem under study is the first step in an ML project [67]. Business understanding is a non-technical phase and, for this reason, communication between data scientists and business experts is the main part of identifying the business problem. During this phase, it is essential to   map the processes, systems, key data elements, and policy documentation for the key domains expressed in the business problem. This information is often created and maintained by the data governance team with an enterprise data governance. The initial step is gathering requirements and clearly define the objectives and key results (OKR). In this part, data scientists should discuss with business experts to determine if ML can really help. For each of the OKRs, it is necessary to define one or more KPIs [68]. These KPIs need to be documented for future reference and will be critically useful in ensuring that the project delivers the expected value. The KPIs must match the metrics (MSE, accuracy, etc.) used by the data science team to understand how model improvement impacts the business. The definition and documentation of business problems provide a key context for the subsequent phases, helping to distinguish relevant data, defining how data maps into the model (both during training and deployment), and identifying which dimensions of the model performance should be monitored once the model is in production and according to what criteria [69].

B. DATA ACQUISITION
During data acquisition, the goal is simply to collect enough data to train the ML model to get the first solution [70]. The data scientist identifies information in terms of features/attributes presented for a specific business problem. These aspects should be discussed with a field-expert data engineer to identify potential data sources. Once the dataset is identified, the data engineer builds the pipeline that makes 6 VOLUME X, 20XX This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and content may change prior to final publication.   the data available to the data scientist. The data engineer performs the preliminary cleaning and validation steps so that there is a sufficient amount of high-quality data to meet the data scientist's needs.
The tasks for data acquisition can be summarized as follows: • Data Extraction: select and integrate the data relevant to the ML task. • Data Analysis: exploratory data analysis to understand the data schema and the characteristics expected by the model. • Data Preparation: identify the data preparation and feature engineering required for the model. This preparation involves data cleaning, and splitting into training, validation, and test set. Data transformation and feature engineering also apply to the model that solves the target task. The output of this step is data ready into the prepared format. For example, NULL values are converted to zero or outliers are excluded from the dataset.
When there is not enough data to train the model, two main methodologies allow to bypass the problem: • Data Augmentation is a technique that allows to increase the number of data available by inserting copies of the data (e.g., in the case of images, we use the same rotated, enlarged, blurred, etc.). • Transfer Learning, which allows to reuse most of the weights of a neural network already trained on a similar problem.

C. ML METHODOLOGY
After data acquisition, selecting the best ML algorithms to solve the problem is a key part of the ML project. Usually, the data science team studies the state-of-the-art for the specific problem and tries a bottom-up approach to solving it. ML is experimental by nature, trying different features, models, parameter and hyper-parameter configurations to find what works best. The bottom-up approach typically consists in trying different models with increasing degree of complexity until reaching the best one. This methodology helps data scientists to start with simple models before trying to implement complex ones.

D. ML TRAINING AND TESTING
The process of training and optimizing a new ML model is an iterative process in which data scientists test several algorithms, features and hyperparameters. Once the best ML models have been chosen, they are re-trained and tested. The models are evaluated using different validation methods such as: • Holdout validation, this is a type of external validation in which the dataset is split into two randomly sized subgroups. • Cross-validation, in which the original sample is randomly partitioned into k equal-sized subgroups. Of the k subgroups, one subsample is kept as testing dataset and k − 1 as training. • Bootstrap validation, in which we resample the dataset with replacement producing new datasets with the same number of instances as the initial dataset. VOLUME X, 20XX  The output of this step is a set of metrics for evaluating the quality of the model. Once this iteration is complete, the weights of the best models are saved and deployed using an API infrastructure. Training and testing an ML system is integration, data validation, trained model quality evaluation, and model validation. The main goal is to keep track of all experiments and maintain reproducibility while maximizing code reusability [71]. We have seen that there exist different tracking tools which can simplify the process of storing the data, the features selected, and model parameters along with performance metrics. These allow to compare the differences in performance and aid the reproducibility of the experiments. Without reproducibility, data scientists are unable to deliver the model to DevOps to see if what was created in the lab can be faithfully reproduced in production [72].

E. CONTINUOUS INTEGRATION
Continuous integration is a well-established development practice in the software development industry [52] and is the first step in starting the continuous delivery journey. CI enables companies to have frequent releases, and improve software quality and teams' productivity [73]. This practice includes automated software building and testing [74].
In the continuous integration pipeline, we build source code and run various ML trained models. The outputs of this stage are components (packages and artifacts) to be deployed in the pre-production/production environment of continuous delivery [75]. The ML code is a small portion of a real ML system because an important component is the infrastructures, configuration and data elaboration. Continuous integration for ML systems relies on having a substantial impact on the end-to-end pipeline to automate the delivery of the ML models with minimal effort. The main steps for continuous integration are [22]:  • Source code management (SCM); • Push/pull changes to the repository to trigger a continuous delivery build; • Check of the latest code and associated data version from the data repository storage; • Running of the unit tests; • Building/running of the ML model; • Testing and validation; • Packaging of the model and building of the container image; • Pushing of the container image to the registry. Several software tools have been used for the deployment of ML models such as Jenkins [76], Git [77], Docker [78], Helm [79], and Kubernetes [80]. Then, to summarize, the pipeline and its components are built, tested, and packaged when new code is committed or pushed to the source code repository. CI is testing and validating code, dataset, data schemas and models. The validated model is deployed to a target environment to provide predictions. This deployment can be one of the following: • Microservices with a REST API to provide online predictions; • A model embedded into an edge or mobile device; • Part of a batch prediction system.

F. CONTINUOUS DELIVERY
Continuous delivery has the goal to ensure that an application is always in a production-ready state after successfully passing the automated tests and quality checks [81]. The object of 8 VOLUME X, 20XX This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and  the deployment stage is to enable a seamless roll-out of new models, with the lowest possible risk. Best practices in the continuous delivery of software services involve the use of safe deployment techniques, such as A/B tests. CD is an ML pipeline that should automatically deploy model services. CD employs a set of practices such as CI, and deployment automation to automatically deliver software in production [82]. CD is a push-based approach [83] and this practice has reduced deployment risk, lowered costs, and gained user feedback faster.
In this phase, the construction of artifacts takes place, which were produced by previous continuous integration in the staging/pre-production/production environment. Test models are obtained from this phase. The components of the CD pipeline are summarized as follows: • Staging environment: deploying the trained ML model first in a staging environment is a standard operation in ICT. The output of this step is a test model that is pushed into the model registry archive. • Model register archiving: necessary to define an archiving location where ML models in staging state and ML models in production state are loaded. • Automatic activation: this step is performed automatically according to a schedule or a response in the production environment. The output of this phase is a test model that is pushed into the staging environment.

G. CONTINUOUS TRAINING
During continuous training, we need to keep storing more data and setting up the data in the same way we train our model. This means detecting outliers to understand when the data distribution diverges from the training data. CT is concerned with automatically retraining and serving models [84]. Continuous training is a part of MLOps which automatically and continuously retrains models before they are redeployed.
To design a continuous training strategy, we should answer the following questions [85]: • When should a model be retrained?
--Trigger based on data changes.
• How much data is needed for retraining? --Fixed window.
• When to deploy the model after retraining? --A/B testing.

H. CONTINUOUS MONITORING
The main objective during the monitoring stage is to manage the risks of the in-production models by checking for performance drift [86] and alerting an operator that model accuracy has dropped. The model predictive performance is monitored to potentially invoke a new iteration in the ML process. Once the model has been deployed to production, it still needs continuous validation or testing because patterns in the data can change over time. The model may become less accurate because the data used in training the model are no longer representative of the new data existing in production [71]. Performance monitoring not only affects the quantitative performance metrics. Therefore, during the continuous monitoring, both metrics and the KPIs from the technical part to the business part must be taken under control.

I. EXPLAINABLE AI
Deep Learning methods [87] now dominate benchmarks on different tasks and achieve superhuman results. This improvement has often been achieved through increased model complexity. Once these models have become a real application in the production, the community have started studying the "explainabability" of the models to answer business questions. Explainability can be defined as "the degree to which a VOLUME X, 20XX 9 This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and content may change prior to final publication. human can understand the cause of a decision" [88]. Explainability is mostly connected with the intuition behind the outputs of a model [89]; therefore, an ML system is explainable when it is easier to identify cause-and-effect relationships within the system inputs and outputs. For example, in image recognition tasks, part of the reason that led a system to decide that a specific object is part of an image (output) could be certain dominant patterns in the image (input). The more explainable a model is, the greater the understanding practitioners get in terms of internal business procedures that take place while the model is making decisions. An explainable model does not necessarily translate into one that humans are able to understand (internal logic or underlying processes) [90]. The explainability of the model allows the user to build trust in the predictions made by the deployed system and improve transparency. The user can verify which factors contributed to certain predictions, introducing a layer of accountability [15].

J. SUSTAINABILITY: CARBON FOOTPRINT
The increasingly common use of Deep Learning models in real world projects, as the other side of the medal, corresponded to an immense growth in the computation and energy required [91]. If this growing trend continues, Deep Learning could become a significant contributor to climate change. This trend can be mitigated by exploring how to improve energy efficiency in the DL models [92]. Hence, data scientists need to know their energy and carbon footprint, so that they can actively take steps to reduce them whenever possible. Carbon footprint is a measure of the total exclusive amount of carbon dioxide emissions that are directly and indirectly caused by an activity or accumulated during the life stages of a product [93]. Strubell et al. selectively focused in carbon footprint analysis on AI models for natural language processing [94]. For example, the training of an NLP Transformer model was estimated to be equivalent to that of a commercial flight between San Francisco and New York. The publication of these estimates has had a significant effect in the scientific world. Following the publication of these data, the 2020 White Paper on AI released by the European Commission has called for actions that go beyond the collection of impressive but admittedly anecdotal data about the training of selected AI systems [95]. For this reason, it is necessary to calculate the carbon footprint of each individual AI system and the AI sector [96].
It is important to emphasize that, during the MLOps lifecycle, carbon footprint should be taken into account when choosing models. It should be better to take a bottom-up approach trying the first simple models without jumping to the state-of-the-art with complex and expensive models. The same approach is to calculate the carbon footprint during training and testing, but also during continuous integration, continuous delivery and continuous training.

V. CONCLUSIONS
In this paper, we have provided an overview of approaches in the literature using MLOps: we have provided a taxonomy of the current literature and proposed a methodology for addressing MLOps projects. The application of DevOps principles to ML and the use of MLOps in the industrial environment are still little discussed topics at the academic level. Current literature is mostly disconnected and sporadic. This paper is intended as a literature review to systematize and add clarity to the definition and methods of MLOps. The paper aims to define a high-level strategy for dealing with MLOps projects; the goal of future work is to apply our proposed methodology to use cases such as biomedical imaging and finance. Experimental work will be required to test the pipeline defined in this manuscript.
Traditionally, data preparation, model training and testing, and performance comparison are key points of traditional pipelines. In this work, we have stressed the importance of many other, no less important aspects, such as continuous monitoring, sustainability issues, etc. Following well-defined guidelines is the only way to allow the traceability and reproducibility of the results obtained in an Open Science context. For this reason, it is crucial to use systematic procedures for greater cohesion in the scientific community in order to follow clear and clean pipelines in MLOps. The remaining challenge for the community is to try to apply an ML methodology into an end-to-end use case trying to go through each point of this methodology and show what happens if some phases are not used. Specific areas, such as biomedicine, finance, cyber-security, manufacturing [97], can greatly benefit from adopting MLOps, and we believe the pipeline defined in this paper can bring advantages over traditional practices.
According to Fortune Business Insights, the global Machine Learning market is expected to grow from $15.50 billion in 2021 to $152.24 billion in 2028 with a compound annual growth rate of 38.6% over the forecast period. MLOps aims to create long-term ML solutions, reducing maintenance costs, and monitoring and optimizing workflows. Understanding and intercepting new challenges and trends such as the emerging MLOps will provide a strong competitive advantage to companies adopting this solution [98] ABBREVIATION TERMS VOLUME X, 20XX This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and content may change prior to final publication.