A Federated Learning Framework for Enforcing Traceability in Manufacturing Processes

The plethora of available data in various manufacturing facilities has boosted the adoption of various data analytics methods, which are tailored to a wide range of operations and tasks. However, fragmentation of data, in the sense that chunks of data could possibly be distributed in geographically sparse areas, hampers the generation of better and more accurate intelligent models that would otherwise benefit from the larger quantities of available data which are derived from various operations taking place at different locations of a manufacturing process. Moreover, in regulated industrial sectors, such as in the medical and the pharmaceutical fields, sector-specific legislation imposes strict criteria and rules for the privacy, maintenance and long-term storage of data. Process reproducibility is often an essential requirement in these regulated industrial sectors, and this issue could be supported by AI models which can be applied to enforce traceability, auditability and integrity of every initial, intermediate and final piece of data used during the AI model training process. In this respect, blockchain technologies could be potentially also useful for enabling and enforcing such requirements. In this paper, we present a multi-blockchain-based platform integrated with federated learning functionalities to train global AI (deep learning) models. The proposed platform maintains an audit trail of all information pertaining the training process using a set of blockchains in order to ensure the training process’s immutability. The applicability of the proposed framework has been validated on three tasks by applying three state-of-the-art federated learning algorithms on an industrial pharmaceutical dataset based on two manufacturing lines, achieving promising in terms of both generalizability and convergence time.


I. INTRODUCTION
Nowadays, a wide set of Industry 4.0 technologies, such as Industrial Internet of Things (IIoT), cyber-physical systems, Process Analytical Technologies (PAT), Artificial Intelligence (AI) and blockchain structures, have been employed in various types of manufacturing industries to address the increased demands for smart manufacturing solutions [1]. Industrial companies are progressively relying on large volumes of data for monitoring and optimizing their internal production processes [2]. In modern production environments The associate editor coordinating the review of this manuscript and approving it for publication was Prakasam Periasamy . large datasets often have to be utilised and they could be widely distributed at multiple, geographically distinct industry locations, while local data privacy may be also at risk. Consequently, data gathering and analysis as well as decision making can be challenging issues to be dealt with and, in such cases, modern machine learning solutions that combine cutting-edge technologies, like deep learning and blockchains, would be appropriate.
In a contemporary Industry 4.0 environment thousands of sensors collect data 24/365 for enabling industrial production chain supervisors and managers to better plan their production processes, as well as to detect and anticipate possible process failures that could potentially hinder the facility's well-being and effective maintenance. At the same time, the ever-increasing volume of data necessitates novel and more sophisticated AI methods for data processing and analysis, that complement massively diffused simple statistical and non-parametric methods [3].
In addition to data volume-related problems, modern industries are often required to conduct their operations by following strict regulations which are introduced by international regulatory bodies and organisations. Such regulations aim to ensure the quality of the produced products, and to guarantee the transparency and validity of the manufacturing processes. For example, aerospace companies are obligated by law not only to ensure top-notch quality of their products, but also to preserve all data and metadata produced by the totality of the manufacturing operations [4]. Such information may prove crucial in case of failures occurring due to production mishaps. In these unfortunate cases, a manufacturing production process is often required to be reproduced, in an end-to-end manner. Possible causes of failures have to be identified and necessary measures have to be taken to ensure that failures and undesired events will never happen again.
Another type of industry that is nowadays heavily monitored and regulated is pharmaceutical manufacturing companies. Pharmaceutical production companies often use empirical data to identify patterns and they apply test procedures for analyzing the efficacy of drug treatments. Many controls and tests are performed at each pharmaceutical production step and every individual company shall be responsible to respond promptly to government or institution requests for ensuring process validity and transparency at any time, sometimes even many years later, after a drug lot is being produced and put in the market. Blockchain-based approaches have already proven their capacity in guaranteeing data integrity, traceability and accountability and for this reason they are increasingly adopted in application fields, such as pharmaceutical manufacturing environments, that necessitate such properties [5], [6].
Moreover, industry sectors, like the pharmaceutical production one, come with restrictions concerning data privacy and as such, most often than not, data-related processes, such as data relocation and data transmission, are often difficult to be performed, and they are subject to many pitfalls. Dataset fragmentation constitutes a fundamental limitation in training quickly a deep learning model which we want to be as much accurate as possible [7]. A major characteristic of federated learning is that it achieves collaborative data training by performing coordination and parallel computation among multiple and heterogeneous computational nodes, sensors and data streams. Federated learning distributes the model training as well as the processing burden among multiple computational nodes, and in that way, it minimizes the operational and computational overhead, whilst at the same time data privacy is enhanced [8].
This paper presents an approach for acquiring, processing and analyzing heterogeneous real-time data streams in the context of pharmaceutical manufacturing lines. The approach is conjoined with federated learning capabilities for training deep learning models that reside in sparse, remote locations. Specifically, the proposed approach trains local models which run on independent production lines' data, and it is aiming to extract and interpret data distributions on disjoint datasets. To ensure the integrity of local training processes, every model's node employs a blockchain containing the training data and intermediate training derived data. Successively, excerpts of these local information are transmitted to a common global blockchain, accessible by all nodes that is used for global model training and ensuring (end-to-end) the learning process's integrity, immutability and reproducibility. Whenever a regulated company is under an internal or an external audit process, it can rebuild the complete chain of production steps, ensuring the integrity of this reconstruction, and thus it can have important insight on the reasons why a specific failure condition occurred to aid managers and decision makers in taking mitigating actions. The finegrained training information stored in the blockchains can also be exploited whenever a pharma manufacturing corporation needs to add or expand its production lines, or collaborate with other corporations. It can exploit the knowledge gathered by analyzing existing production lines and use it as a base to boost the training of new models faster and with better accuracy. The proposed method was tested using a dataset provided by a pharmaceutical manufacturing company which participates as industrial partner in the context of the ''Smart Pharmaceutical Manufacturing'' (SPuMoNI) EU-funded research project [9]. Data in this dataset were produced by various sensors installed across two manufacturing lines of the factory floor, by measuring parameters such as temperature, humidity and air-pressure. The experimental evaluation performed involved missing data imputation, outlier detection, and predictive maintenance, and all these evaluations have shown promising results regarding the generalizability and convergence of the utilised deep learning models.
Summarizing, the contributions of the current paper are as follows: • Use blockchains locally to store information about training data information and enhance local data privacy and globally to store intermediate training parameters; • The method can employ a variety of existing federated learning algorithms to train models on distributed datasets; • The training data and training process data auditing mechanism permits both the reconstruction of an exact copy of a previously trained model and the backtracking to the initial steps of the training process in case such information is needed (e.g., for institutional auditing); • Comparative performance evaluation of federated learning and transfer learning by using public datasets and real manufacturing data.
The rest of the paper is structured as follows: the next section describes briefly the SPuMoNi project, while Section III provides a presentation of the related work and Section IV describes the integration of existing federated learning methods with blockchain in an IoT sensors-based industrial production environment. Section V is dedicated to the performance evaluation of the aforementioned applications by using different federated learning strategies. Finally, in Section VI conclusions and directions for future work are given.

II. THE SPUMONI PROJECT
The main goal of the SPuMoNI 1 research project is the development of a technological framework for the establishment of data integrity in representative pharmaceutical industries. The work performed in the SPuMoNI project aims to ensure the quality and integrity of large amounts of data produced by computerised systems in pharmaceutical manufacturing environments. In particular, this project is focusing on: 1) designing data quality assessment models based on specific data quality dimensions agreed by the European Institute for Innovation Through Health Data, including data integrity rules derived from regulatory compliance documents, and 2) identifying behaviour patterns of data probability distributions over time, locating outliers and predicting data integrity violations. In the SPuMoNi project two types of data are being considered. The first one concerns product types and the associated recipe information, such as list of materials and batch quantity. The second one concerns machinery and equipment control information, and operational device parameters, such as Temperature, Pressure, Centrifuge Velocity, Current Intensities/Voltages and Agitator Speed, and their corresponding alarm signals.
Raw data attributes are represented in various formats such as numerical data with single values, time-series measurements or non-numerical and categorical data, such as the case with chemical/physical properties of materials/substances.
The requirements analysis performed in the SPuMoNi project revealed the need of a unifying method for considering various types data produced in a pharmaceutical manufacturing environment, while also ensuring the data integrity.

III. RELATED WORK
The integration of federated learning techniques with blockchain structures is usually applied to IoT infrastructures met in multi-layered architectures of industrial environments. Traditionally such architectures consist of two layers: (i) the lower production plant edge layer, enriched with multiple sensors to acquire and transmit large quantities of data in real-time; and (ii) the upper cloud layer, where a central server provides services for storing, processing and analyzing massive volume of production-related data.
Data integrity is a critical issue for the design, implementation, and usage of any information system which stores, processes and retrieves data in the context of an industrial manufacturing environment. Data accuracy, validity, completeness, availability and consistency should be ensured all over the production life cycle [10]. In addition, data integrity is related straightforward to end-to-end traceability and autonomous real-time monitoring. Thus, advanced AI-based data analytics can be useful for supporting fast data processing as well as effective monitoring and decision making [10]. In addition, dedicated machine learning and deep learning approaches can be useful to manage a large amount of data [11].
Advanced data predictive analytics methods can be also applied in the context of smart manufacturing architectures for supporting data integrity and quality control decisions. The resulting benefit is to reduce process deviations and failed batch runs, and also support early fault prediction [12]. In particular, deep learning methods have been efficiently applied for performing fault classification and diagnosis in industrial rotating machinery [13], fault detection in chemical process development in pharmaceutical industry [14], as well as for improvement of process industry planning [15].
Deep learning methods have been also used to predict the remaining useful life (RUL) of machines in industrial environments [16]. These methods offer a great predictive potential when they are combined also with soft sensors (i.e., software sensors), while the input data to predict a target quantity (output) are acquired by signals from hardware sensors and actuator devices [17], [18].
Conventional deep learning techniques usually handle centralized data streams by exploiting a single central processing entity (e.g., a cloud server) for model training as well as for handling data privacy issues and the overhead of massive raw data communications. Conventional centralized data processing approaches often fail in big data processing scenarios met in industrial environments, where parallel and distributed data streams are provided from multiple and geographically separated production lines [19].
Federated learning schemes have shown promising results when fed with large and distributed datasets created by various sensors and IoT devices in industrial applications [20], [21], [22]. A federated learning approach might be considered as an effective solution for combining heterogeneous datasets, especially when emanating from different production lines in a pharma manufacturing environment. Moreover, federated learning techniques can be integrated with blockchain technologies, aiming to mitigate data privacy issues [23], [24], [25] and promote or ensure data transparency, immutability and integrity [26], [27], by integrating those learning techniques with blockchains [28]. However, to the best of our knowledge, none of the existing works that combine federated learning with blockchain tackle the problem of training process integrity and the exact reconstruction of a previously trained model.  For this reason, we propose a federated learning-based architecture capable of gathering information about every single step of the training process while at the same time it supports secure and private storage of all intermediate training data produced. In addition to the training process intermediate data, a multitude of other parameters can be also stored in the blockchain, permitting in that way the deterministic reconstruction of a trained federated learning model.

IV. FEDERATED DEEP LEARNING-BASED ANALYTICS ON RAW SIGNAL DATA
Federated learning aims at building and training strong predictive global models, starting from weaker or smaller local models, without the need to replay all data processed by each local model. Instead of transmitting the whole data to a centralized computing system, in a federated deep learning scenario only intermediate parameters or local gradients are transmitted and, depending on the integration strategy, the main model is trained indirectly by adopting simple mathematical formulas. In a large-scale industry, where multiple production facilities are diffused, even in disparate locations, such an approach can be suitable since each facility would have access in some parts of the dataset. In addition, data transmission cannot always be possible, due to privacy issues, business policies or simply because the sheer volume of data makes difficult their transmission.
By training models using data exclusive to single facilities and then combining their individual predictive abilities, industries can obtain more accurate and generalizable models. Fig. 1 illustrates the integration of a new production line reusing the models previously created. Therefore, whenever a manufacturing industry needs to integrate a new production line, it can exploit the knowledge gathered by the existing lines.

A. TRACKING THE LEARNING PROCESS WITH BLOCKCHAINS
Being able to reproduce an identical model to another one can be important in cases where there is a need to backtrack to every single stage of the training process. Data integrity and immutability are critical requirements in any regulated industry, such as in pharmaceutical manufacturing businesses, therefore, the method presented herein aims at reproducing an identical twin model, using exactly the same parameter values.
To achieve this, the method stores and integrally keeps the following pieces of information regarding local models: • Training sample data • Training batches • Initial network parameters (which are usually random, depending on their initialization strategy) • (Optionally, if used) Optimizer state parameters.

1) LOCAL MODEL TRAINING
During model initialization, the node maintaining the model generates an initial sequence of blocks with the aforementioned information. The structure of such blocks is shown in   During the local model training, at each step of the process, and whenever a mini-batch is defined, the hash values of the training samples are appended and the hash value of the whole mini-batch is computed and assigned. This operation is done instead of hashing the whole training sample data contained in the mini-batch for efficiency purposes. The list of the training samples together with the mini-batch's hash value are then included in a block and appended to the local blockchain (Fig. 3). This operation takes place whenever a new batch is prepared and forwarded into each local model.
After each forward step, the corresponding loss is calculated and the backward model training step and parameter update are made. The updated values of the parameters of the model are then hashed, packed in a block and stored in the local blockchain. However, given that model parameters occupy a relatively large space depending on the model, the platform offers the option to not save the parameters of the local models at each step, but to specify the number of steps that each local node should append the model parameters in the local blockchain. This is a global parameter, i.e., it is the same for all local nodes that take part in the training process.

2) GLOBAL MODEL
In addition to the local blockchains, the platform also includes a global blockchain. This blockchains preserves the necessary training information to ensure that both local and global model training processes remain original and integral. Local blocks are added to this blockchain to store parameters and metadata about the data samples that were used during the corresponding training step. Global blocks, instead, are generated by a smart contract 2 which checks if the currently available local models have submitted their parameters to the blockchain. The smart contract (Alg. 1) maintains a list of all the participating nodes in the federated learning setting and when the last node submits it chooses randomly the next aggregator node from such list and writes it as a block in the global blockchain. As soon as the block is transmitted to the rest of blockchain nodes a blockchain event fires with the aggregator node as the only receiver. The aggregator node then executes the selected parameter aggregation strategy and adds a global block to the blockchain that contains the most up-to-date version of the parameters of the global model, as depicted in Fig. 4. This block also contains an update event that notifies all registered federated nodes that a parameter update occurred.
Using a blockchain node as a central point for communication and data storage ensures: 1) The model updating with integral, untampered data; 2) The committing of local values before the global model update.
These two characteristics ensure that at the end of an update cycle, all models will contain up-to-date parameters. In case that a node fails for a limited amount of time (in training epochs), the model can fetch the future updates by reading the data in the last global block. The training process data in the blockchain can be validated by reading the local blocks among two consecutive global blocks, by applying the federation strategy and confronting it with the values of the following global block (Fig. 5).
Furthermore, holding information regarding the learning process of a distribution environment enables the selection of subsets of data which can be used in new model updates. Whenever the model training process presents problems (in case, for example, of noisy/wrong data or wrong settings) the operator can ignore the updates and recreate a new global model based on parameter updates deemed accurate or correct.
Both local and global blockchains operate using a Proofof-Authority (PoA) consensus mechanism. This choice was made because other types of consensus mechanisms available do not fit the context of the SPuMoNI platform. In fact, the Proof-of-Work consensus mechanism requires unnecessary mining nodes for validating transactions with the subsequent energy expenditures. On the other hand, Proof-of-Stake introduces the concept of ''node importance'' that does not fit in a completely distributed peer-to-peer network with equal peers.

B. APPLICATION TO A PHARMACEUTICALS MANUFACTURING FACILITY
To demonstrate the functionality of the federated learning method, we integrated and tested three different deep learning models that deal with the following tasks:

1) MISSING VALUE IMPUTATION (MVI)
Missing values can occur in environments where large numbers of sensors acquire and transmit data. Frequently, it comes in the form of NaN or Null values which data analytics methods are not able to process.
In this context, the SPuMoNI platform requires an efficient and accurate method to apply missing value imputation in time-series data since it deals with large number of sensors and deep Transformers is the current state-of-the-art for time-series processing [29], [30]. The transformer-based approach [31] employs a flexible deep network trained through an iterative process to extract and model long-and short-term time dependencies and produce reliable data. The network is trained to minimize the Mean Squared Error (MSE) loss by employing Equation 1: At each input sequence in the time-series x(t) a mask M is applied to hide a part of it producing the masked input x(t, i). The network then attempts to identify the masked parts by minimizing the MSE between the masked input sequence and the output predicted sequence (y(t, i)).

2) OUTLIER DETECTION (OD)
Outlier detection is relevant for data analytics, especially in operations occurred in manufacturing lines. An outlier sensor data may represent an incorrect line configuration or, a mechanical, electric, or electronic equipment malfunction. In addition, outlier values can occur when everything is operating normally but for some reason the product quality has been compromised. Such occurrences must be identified and corrected as soon as possible especially in critical industrial sectors such as the pharmacy where human heath is involved. While fixed-threshold methods represent soft or hard margins for outlier detection, often near-margin non threshold violating values can also hide anomalous situations that can influence performance both in sort-and long-term. Therefore, the platform integrates a deep neural network based on the transformer network architecture. For this network, we employed the Binary Cross-Entropy loss (Equation 2): As input, the network accepts a numerical sequence x with items x i of length N . The output y is a sequence that represents the probability of an item be an outlier. 57590 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply.

FIGURE 5. Federated learning method: Each production line includes a neural network that is trained locally (A through D).
During the update cycle each network transmits their weights or gradients (depending on the federation learning method) to the blockchain. When all models in the production line commit their updates, a smart contract fires that informs the models that an update cycle is completed including the number of the blocks that contain the updated parameters. The models then read these blocks and update their parameters, accordingly.

3) MANUFACTURING LINE HEALTH CONDITION MONITORING AND PREDICTIVE MAINTENANCE
Using the past data for training, a Long-Short Term Memory (LSTM)-based recurrent neural network, in a many-toone configuration, predicts the next fault in the production line and its current health condition based on the raw data. It allows an maintenance optimization interventions aiming to minimize both ''dead'' production times and the risk of a malfunction occurring.
The LSTM-based recurrent neural network is employed for monitoring continuously raw sensor data for identifying short-and long-term dependencies. It can pinpoint production line malfunctions that are expressed by sporadically appearing, but repeated and consistent, abnormal patterns. This problem was tackled both as a regression and a classification problem. In particular, given a set of past raw sensor data, for each time step, the model outputs the next production line failure minimizing the MSE loss function between the predicted value (p) and the actual one (y) as described by Equation 3: The health condition instead, was tackled as a classification problem. In particular, the same neural network is trained by minimizing the Categorical Cross Entropy Loss (Equation 4) to make a three-class prediction (Good, Intermediate, Bad): where p represents the probability distribution output of the Softmax function, applied at the output of the last linear layer of the network, and y is the actual value. Finally, the total loss L that the network is trained against is given by the sum of L CE and L MSE as represented in Equation 5: Having an estimate of when the condition of a manufacturing line will deteriorate to the point that it will be no more cost-effective to keep running it we can make a prediction on the most suitable maintenance time for achieving the best trade-off between maximizing the efficacy of the intervention and minimizing the amount of idle time in the manufacturing line.

C. FEDERATED LEARNING APPROACHES
Depending on many factors gradient or parameter aggregation strategies work sometimes better and sometimes worse than others meaning that, there is no single federation approach that universally works better than all other existing methods. In case of trivial tasks, for example outlier detection, all parameter update methods work more or less equally mostly because of the simplicity and the low variance in the datasets. Contrary, on more difficult tasks, like missing value imputation, methods tend to perform with varying degrees of success.
We have tested the proposed method with three well established federated learning methods: 1) Federated Stochastic Gradient Descent (FedSGD) [32] where in each training step the gradients are sent to the model aggregator and subsequently averaged. The global model parameters are updated using the average gradient normalized to the number of samples on each local node; 2) Federated Averaging (FedAvg) where the local models are permitted to conduct local parameter update and the updated parameters are then transmitted to the model aggregator, averaged and used for the global model update step [33];

3) Federated Learning with Dynamic Regularization
(FedDyn) [34] which keeps track of the global loss trend and then applies dynamic regularization, aiming at making local model loss to converge to the global loss.

V. PERFORMANCE EVALUATION A. DATASET AND HYPERPARAMETER DESCRIPTION
All data for this research has been provided by a pharmaceutical company, which participates in the SPuMoNI project. These data were obtained from two production lines and were properly configured to pass values from sensors throughout the production process. These values are mainly some physicochemical parameters such as temperature, humidity, speed, pressure, etc. While the first production line data (I1000) contains 176 batches of independent drug batches, the second production line data (I600) contains 296 batches. Each batch is treated as a new cycle of production process. When a batch ends, it starts a new circle of drug production, producing chronologically consecutive batch productions orders. For the missing value imputation, the raw data have been replaced randomly with NaN values covering the following missing value scenarios: The data splits were defined as a per-batch division of the raw signals (i.e., each individual batch's data was exclusively included in only one split). Therefore, the data set was partitioned into training (70%) validation (10%) and the remaining 20% were used for testing.
For the health condition monitoring and predictive maintenance, we used the same data but this time, the whole set was chronologically concatenated and divided in 30 consecutive maintenance time-periods. Labels of line conditions and dates of maintenance interventions were evaluated and provided by 3 experts who were responsible for the actual maintenance operations of the production lines that produced the data that we are using in this work.
For the outlier detection part of the evaluation, the concatenated dataset has been divided into three per-batch splits that contained 60%, 20% and 20% for training, validation and testing, respectively.
The hyperparameters used for training the various models differ by case and can be seen in Table 1. In Table 2 instead, important statistical information about the dataset values for the performance evaluation are shown.  All experiments were executed on a PC with a Ryzen 7 5800X processor, 64GB of main RAM and an Nvidia Titan X GPU with 12 GB of VRAM.

B. FEDERATED LEARNING EVALUATION SCENARIO
To evaluate the effectiveness of the proposed method, first, we simulate a pharmaceutical production environment with one existing manufacturing line. Then, we add a new line. In particular, we train a model for each of the above-described tasks with a single dataset and store all intermediate parameter updates of these first models in the blockchain.
The performance of the models is evaluated and then, using the last global block in the blockchain (which is identical to the last local block, as we have only one model) another exact copy of the model is added to the system representing the new production line (transfer learning). The models then continue their training with sets of data and the training performance of both lines models is evaluated. The transfer learning scenario is shown visually in Fig. 6.
Finally, we train the two models from scratch, representing our federated learning scenario. In this case, the models are initialized randomly and then they take the training steps as it is dictated by the currently tested federated learning strategies described in the previous section.  In all cases the achievement of convergence was determined at the epoch where the validation loss has not decreased more than 0.5% in an interval of 5 epochs.

1) SINGLE MODEL
Training the models separately each one with its corresponding dataset yields the performance shown in Table 3. The performance obtained in this setting is considered as a baseline for the other experiment settings. All models converged eventually (Fig. 7, 8 and 9), but, especially the MVI settings needed more than 250 of epochs of training (Fig. 7). The MAE achieved in the MVI task was equal to 2.6 for I1000 and 2.4 for I600. Loss chart for the MVI task with isolated manufacturing lines. Training performance was similar in the two cases. However, compared with the other two tasks, the MVI needed much more epochs to achieve convergence.  Loss chart for the predictive maintenance task with single manufacturing lines. Both models achieve convergence after 100 epochs.

2) TRANSFER LEARNING
In this setting we fully train a model in a conventional way and after achieving convergence, another completely new model is added. The new model downloads the parameters of the last global block in the blockchain and starts its training process. Given that, for the two production lines, the I600 converged generally faster and achieved better performance indicators, VOLUME 11, 2023 57593 Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply.  we train first each network and when it achieves convergence, we add the other one. In this manner, we are able to evaluate the importance of using a more robust model as base against a weaker one.
As expected, second models training converged faster when using transfer learning. With respect to the baseline, we observe a reduction in number of epochs needed for convergence of the second model in all tests (Figs. 10,11,12 and 13). Such reduction is so substantial, that highlights the importance of transfer learning in general. Also, in all settings, when using the models trained on the I600 manufacturing line data as base, the second models trained on the I1000 data converged faster and achieved better performance indicators. This means that using a more robust model as a base model for transfer learning yields better results both in terms of convergence time and performance indicators.
Regarding performance indicators, it is clear that federated learning offers a net performance gain of at least 2% in the MVI and 6% in the predictive maintenance task.  As seen in Table 4, using 10-fold leave-one-out crossvalidation the average increase in MAE is of 3.15% and 6.15% in the MRV and MRB MVI tests, respectively, 5.6% in the predictive maintenance test but there is a decrease of 0.5% in the outlier detection test. This decrease in performance can be explained by the data itself. In fact, outlier detection is a model-and setting-specific task and data and learning parameters of other models and settings can be seen as contamination. Nevertheless, and depending on the criticality of the application, the gain in network training time (in epochs) can justify this slight decrease in F 1 -Score and render it acceptable.

3) FEDERATING LEARNING
In this setting the models start training from scratch and updates occur periodically on the global model.
Regarding convergence times, there is a clear advantage of using federated learning when compared against single model training (Tab. 5 and Figs. 14, 15 and 16).

FIGURE 14.
Concurrently training both models for the MVI task we observe a reduction in number of epochs of about 39% (average for both I600 and I1000).

FIGURE 15.
Concurrently training both models for the outlier detection task we observe a reduction in number of epochs of about 51% (average for both I600 and I1000). The performance indicators in this setting behaved almost identically to the previous one (Tables 6, 7 and 8). However, while there was a increase in their values, only the FedDyn algorithm is at best comparable to the transfer learning. The   only substantial difference came from the outlier detection task, where we observed a less pronounced decrease in performance. This possibly means that updating a model's parameters in small steps, when datasets do not come from the exact same data distribution the models manage to discriminate between same and non-same distributions and ignore (or better, mitigate) confounding updates.
Also, given that the number of local datasets is too low (only 2) and each containing data belonging to different distributions, we expect better performance when the number of local datasets is increased.
For this reason we compared the performance of the three federation learning algorithms on the Server Machine Dataset (SMD) [35] using the same transformer-based classifier. The SMD dataset includes data from 28 different local sets containing 38 dimensions each. We omitted testing transfer learning on this dataset because catastrophic forgetting [36] would become significant after training the same model sequentially on 28 different datasets. On this dataset, FedDyn yielded a significant performance advantage w.r.t. both FedAvg and FedSGD (Tab. 9).
Our results indicate that if maximum perfomance is requested and the number of available local datasets is low then transfer learning is the best choice. On the contrary, if there is a sufficient number of available local datasets then federated learning is superior to transfer learning both in terms of training efficiency and performance. Especially, regarding training efficiency, there is no substitute for federated learning as it permits the parallel training of models minimizing the total running time for a training experiment.

VI. CONCLUSION
In this paper we presented a platform for federated learning on manufacturing data coming from pharmaceutical production lines using multiple blockchains as data sharing mechanism to guarantee both data and learning process integrity, accountability and immutability. These are necessary properties in a heavily regulated industrial setting and it is also used as a means for sharing parameter updates. Federated learning can be used to both reduce the time needed for model training and increase the corresponding performance metrics. The evaluation of the performance of the method demonstrated that federated learning performance could and should be employed in the pharmaceuticals industry, whenever the best compromise between training efficiency and performance is requested.
ISAAK KAVASIDIS is a Researcher with the South-East European Research Centre (SEERC) and an Assistant Researcher with the University of Thessaly, Greece. In 2014, he participated in the Marie Curie RELATE ITN Project as an Experienced Researcher. He has coauthored more than 40 scientific articles in peer-reviewed international journals and conferences. His current research interests include medical data processing and brain data processing using machine and deep learning methods and the decoding of human brain functions and transfer to computerized methods.

EFTHIMIOS LALLAS is currently an Associate
Professor with the School of Technology, University of Thessaly, Greece, where he is also a member of the Applied Informatics and Digital Technologies (AIDigiLab) Laboratory. Previously, he was a Core Network Deputy Director of Huawei. He has participated on several major European research programs. He is the author and a peer-reviewer of many scientific papers and monographs in international conferences and journals. His research interests include network system technologies (broadband and the optical wireless-sensor IoT systems), network on chip architectures (NoC), network computing system design, techno economical evaluation, network system security, blockchain, cognitive devices and systems, and intelligent home networks.
GEORGIOS MOUNTZOURIS received the degree in computer science and the M.Sc. degree in software engineering for internet and mobile applications. He is currently a Software Engineer with Volos Port Authority S. A., and a Research Software Engineer with the School of Technology, University of Thessaly, Greece. His current research interests include knowledge representation and reasoning, multi-agent systems, agent oriented software engineering, graph algorithms, and parallel and distributed algorithms.
VASSILIS C. GEROGIANNIS received the Diploma degree in computer engineering and the Ph.D. degree in software engineering from the University of Patras, Greece. He is a Professor and the Head with the Department of Digital Systems, University of Thessaly (academic subject: analysis, design of systems, and projects with an emphasis on decision making). He was an Adjunct Professor with Hellenic Open University. He was a Visiting Professor with the IPAG Business School, France, and the Siauliai State University of Applied Sciences, Lithuania. Since 1992, he has been participating in several projects. He is the author/coauthor of more than 130 papers published in international journals/conference proceedings and cited in a plethora of citations. He is the coauthor/an editor of three scientific books. He is a member of the Management Board of the Hellenic National Academic Recognition Information Centre (NARIC); the Council for Research and Innovation, Thessaly; the Management Committee of the Entrepreneurship Innovation Research Institute, Research Center IASON, University of Thessaly; the Management Committee of the Technical Chamber Central and Western Greece; the Central Assembly of the Technical Chamber of Greece; and the Scientific Committee of Electronics Engineers in the Technical Chamber of Greece. He has received the best paper award in three international conferences. He was the conference chair/the program chair and an invited speaker in international conferences. He was also a member of editorial board, the guest editor, and a reviewer in international journals.
ANTHONY KARAGEORGOS received the degree in applied mathematics and computer science and the Ph.D. degree in agent-based software engineering. He is currently a Professor of intelligent information systems with the School of Technology, University of Thessaly, Greece; and the Head of the Applied Informatics and Digital Technologies (AIDigiLab) Laboratory, where his main research involvements are the development of intelligent systems for personalized e-commerce and mass manufacturing. He has received funding for research projects in Greece and EU, which resulted in patents and several international journal and conference publications. He has extensive experience in self-organizing systems and decentralized intelligence. His current research interests include combining immersive reality and decentralized intelligence, self-organizing and intelligent systems, and the IoT and data analytics.