Smart Pharmaceutical Manufacturing: Ensuring End-to-End Traceability and Data Integrity in Medicine Production

Article history: Received 6 April 2020 Received in revised form 21 August 2020 Accepted 29 November 2020 Available online xxxx


Introduction
The pharmaceutical industry is consistently improving its manufacturing processes [1] in compliance with good manufacturing practices [2]. However, inappropriate practices and medicine falsification are still real problems. In the first 10  cal products that claim to prevent, detect, treat or cure COVID-19." Controlling and tracing the pharmaceutical manufacturing process are the primary motivation of the SPuMoNI research project.
SPuMoNI aims to deliver innovative scientific approaches to establish and assure constant proof of the authenticity of pharmaceutical manufacturing data to support dynamic data quality, compliance, and auditability. The SPuMoNI consortium benefits from previous collaborations among its participants in multi-agent architectures for life sciences [5].
The European Union (EU) legislation in pharmaceutical sector is compiled into the EudraBook [6] (Directive 2001/83/EC -Medicinal products for human use). Specifically, its Falsified Medicines Directive 2011/62/EU [7] introduces harmonised European measures which aim to fight medicine falsification and ensure that the trade in medicines is rigorously controlled. This directive is the legal framework which defines obligatory safety features and recordkeeping requirements to impose stricter controls for medicine manufacturing. Moreover, the European Union maintains a dedicated information repository on falsified medicines. 1 The pharmaceutical industry arguably requires effective techniques to control and trace the medicine manufacturing process. There is no guarantee that the current instruments and methods are not susceptible to falsified or falsifiable data from the pharmaceutical data streams. The integrity of pharmaceutical data assets should be compliant by: (i) Attributable, Legible, Contemporaneous, Original, and Accurate (ALCOA) principles [8,9, chapter 1, subchapter C]; (ii) European Medicines Agency (EMA); and (iii) Food and Drug Administration (FDA) regulations. In this scenario, pharmaceutical manufacturing needs novel autonomous auditing and control mechanisms for data capture, governance, and compliance to guarantee the transparency, traceability, and data authenticity. These mechanisms should include effective data quality techniques to ensure non-falsified or non-falsifiable data and detect random or systematic acquisition errors from multiple manufacturing data streams.
This paper presents the methodology of the EU-funded SPuMoNI project to address these challenges within the pharmaceutical industry. It aims to contribute with open software systems and best practices for data integrity and traceability underpinned by the AL-COA principles. The SPuMoNI framework includes: (i) data quality controls specifically with respect to hinder data falsifiability; (ii) traceability assurances that security, privacy, compliance and ownership concerns have been properly met; and, (iii) intelligent control, coordinated data gathering, and processing within a number of contexts and environments.
This work documents the applicability of the SPuMoNI methodology with respect to industry-grade pharmaceutical manufacturing data and environments in the following manner: • Data quality assurance includes temporal and multi-source data variability analysis methods and data consistency quantification methods to ensure the accuracy and trustworthiness of manufacturing information; • Multi-agent systems have been adopted to implement an intelligent control mechanism which ensures seamless data gathering and manipulation, as well as flexible data integrity checks close to the data source, which enable early notification of the system operators about any data discrepancies and thus 1 https://ec .europa .eu /health /human -use /falsified _medicines _en (Last Accessed: 2/Jan/2021). reduce any delays and costs that may be incurred to the manufacturing process; and, • Finally, blockchain is ensuring end-to-end verification of the pharmaceutical process via its traceability and immutability properties. In this context, SPuMoNI integrates a private Ethereum network hosted and managed by the consortium to mitigate security threats.
The remainder of the paper is structured as follows. Section 2 provides some background on the underlying techniques for this work. Section 3 presents a literature review regarding the application of blockchain, data quality mechanisms, and intelligent agents in pharmaceutical industry. Section 4 describes the architectural specification of the SPuMoNI project. Section 5 contains an evaluation concerning these three areas of expertise. Finally, Section 6 presents some conclusions, analysis and a discussion of future work.

Background
The pharmaceutical industry continually assesses electronic data produced through their manufacturing processes and related activities to ensure the integrity of medicines and, ultimately, the safety and well-being of patients. In this context, pharmaceutical data assets should be compliant with data quality principles and international regulations, i.e., the ALCOA principles. The medicine manufacturing process involves a creation of a batch number which encodes its manufacturing history. The ultimate goal of SPuMoNI is to contribute to patient safety and well-being by ensuring medicine tracebility and data integrity of batch numbers (ALCOA compliance). To address this challenge, SPuMoNI methodology involves blockchain, data quality assurance, and intelligent agents.
This section provides a background concerning the pharmaceutical manufacturing process as well as the main areas of expertise within SPuMoNI: (i) batch numbers and ALCOA principles (ii) blockchain technologies; (iii) data quality assurance; and (iv) intelligent agents.

Batch numbers and ALCOA principles
Batch numbers The pharmaceutical manufacturing process is marshalled through a batch number. Batch numbers are represented by any distinctive combination of letters and/or numbers which traditionally encode the complete history of the manufacturing, packaging, labelling, and/or holding of a medicament intended to have uniform character and quality. According to the EudraBook [6], a batch comprises all the units of a pharmaceutical form which are made from the same initial quantity of material and have undergone the same series of manufacturing and/or sterilisation operations or, in the case of a continuous production process, all the units manufactured in a given period of time.
Batch numbers are not internationally homogenised, as they are typically determined only by individual manufacturers. Consequently, this may hinder quality audits, compliance checks, and product recalls. Nonetheless, EU Member States are required to operate a system to collect information useful in the surveillance of medicinal products, with particular reference to adverse reactions in human beings, and to evaluate such information scientifically [10]. Therefore, there is an increasing international emphasis to improve pharmaceutical manufacturing traceability via emerging technologies in accordance with existing legal and regulatory standards [11].
ALCOA principles Data integrity must ensure that data records are: (i) authentic, immutable, and transparent i.e., the data remains unchanged and cannot be deleted; (ii) traceable or auditable i.e., audit trails must exist for all the data; and (iii) safe, i.e., the data is protected against unauthorised access and data corruption. Therefore, data must be collected and maintained in a secure manner, such that they are: -Attributable to the person generating the data -Legible and permanent -Contemporaneous -data is created/recorded when the activity is performed -Original record (or 'true copy') -Accurate Such characteristics are widely known as the ALCOA principles. Within the pharmaceutical industry, data integrity is ensured by the ALCOA principles [8]. Moreover, data integrity is further safeguarded through appropriate data quality and risk management systems-including adherence to scientific principles-and good documentation practices. As the link between ALCOA principles and data quality is well established [12] and highly regulated [13], all new software and hardware systems implemented in pharmaceutical manufacturing lines should have dedicated documentation and be ALCOA compliant and, therefore, data records are required by law to meet the ALCOA primary requirements.

Blockchain technologies
Widely adumbrated as immutable time-stamped data structures, blockchains implement peer-to-peer networks where participants can verify interactions concurrently using decentralised peer-to-peer consensus protocols. A blockchain is formed from a series of "blocks" where each block contains a cryptographic hash of the previous block creating a distributed ledger. Therefore, blockchain is a promising technology in pharmaceutical industry since it holds security, authenticity, immutability, and transparency as main characteristics ensuring end-to-end verification.
The cryptographic techniques in a blockchain ensure that the information contained in a block can only be altered by modifying all subsequent blocks. The data is stored in the blockchain as data transactions. Data transactions, represented in the blocks as Merkle trees, are tamper-proof being validated by nodes of the network. In this context, a blockchain eliminates the centralised authority and enables disintermediation. Its peer-to-peer nature allows secure transactions using a distributed authority to validate the process through consensus algorithms [14].
Ethereum Arguably the most popular blockchain-based distributed computing platform, Ethereum enables developers to implement decentralised and transaction-based systems using a trustful framework [15]. Additionally, it enables the creation of smart contract agreements between peers via transaction-based state transitions. Ethereum also provides a cryptocurrency token "Ether" that can be exchanged between different accounts and used to compensate nodes for the performed calculations, exhibiting significant scalability particularly for private networks [16]. Each Ethereum task uses an amount of gas, i.e., a fee to pay to execute a task. Specifically, it measures the computational effort to perform Ethereum actions.
Consensus algorithms These processes are used to achieve agreement on data transactions in a distributed peer-to-peer environment ensuring that the next block to be added in a blockchain is unique and reliable. This consensus algorithm property of enabling network nodes to validate transactions is called mining. Multiple consensus algorithms have emerged to ensure authenticity, integrity, and consistency of the blockchain technology. Proof of Work (PoW) [17] is the first known consensus algorithm, and it is used in Bitcoin-the most popular blockchain implementation [18]. Several other consensus algorithms, such as Proof of Stake (PoS) [19] and Proof of Authority (PoA) [20], have been proposed to address the problem of high energy consumption and computational work required by PoW. In particular, PoA is proper for private networks where the nodes/validators are aware of all identities. It has been explored mainly to track supply chain, logistics, or manufacturing processes.
Smart contracts Blockchain technologies enable the deployment of pieces of software, known as smart contracts, without involving a trusted third-party entity [21]. A smart contract is a computer program that executes a set of methods making decisions according to the rules defined in the contract methods. A smart contract incorporates a dedicated data structure to store, replicate or update the blockchain transactions in the distributed network. While conventional contracts require a centralised authority involving a significant amount of time and cost, smart contracts eliminate this central authority automating the negotiation between entities, processes, or assets [22]. We build upon our previous work on smart contracts for trust and reputation [23,24] in order to automate their secure distributed deployment.
SPuMoNI is leveraging blockchain technologies to better ascribe and ensure the traceability of medicine batches throughout the entire manufacturing process to enable procedural auditability and compliance with international regulations.

Data quality assurance
Data quality assurance in pharmaceutical industry is a key factor to improve the performance and for good practices implementation, i.e., ALCOA principles. Research into data quality has gained significant attention since the work by Wang and Strong [25]. Following their approach, many studies have been conducted to define what data characteristics are related to its quality, generally known as data quality dimensions. Several systematic reviews have been carried out seeking agreement on different data quality dimensions to be assessed in data repositories [26,27]. The literature highlights problems such as missing information, inconsistency among individual observations, and incorrect or outdated information. On the other hand, Sáez et al. [28] argue that classical statistics are not suitable for different types of data (e.g., numerical and categorical variables) or multimodal data. In addition, they are not appropriate for massive data given the dependency of the results on the sample size. To address these problems, the authors propose methods which compare probability distributions of the variables between different data sources or with different time periods. Specifically, the methods use Jensen-Shannon distance (JSD), a symmetrical and smoothed version of the Kullback-Leibler divergence [29].
Within the pharmaceutical industry context, data quality assurance must necessarily consider that data is generated by multiple heterogeneous sources (e.g., machines, operators, instruments, or even factories producing the same products). In addition, there is possible time variability in the manufacturing process. In this scenario, the problem of temporal and multi-source variability has to be unequivocally addressed, otherwise it may lead to inaccurate and irreproducible scenarios [30,31], as well as invalid [32] results. Multi-source variability analysis use statistics [32,33] to describe variable distributions [34], or to compare data with a reference data set [26]. Temporal variability analysis use monitoring statistics, e.g., Shewart tables [35], Levey-Jennings tables, and Westgard rules [36]. SPuMoNI is systematically applying probability distribution methods to robustly analyse temporal and multisource variability to determine ALCOA compliance.

Intelligent agents
Suitable for distributed, unstructured and decentralised architectures that can be dynamically altered and can accommodate high degrees of complexity, "agents" [37] are directly associated with a certain degree of autonomy, i.e. self-activity towards achieving designated objectives. Grouped into "Multi-Agent Systems (MAS)", agents continually collaborate, coordinate, and negotiate to provide collective functionality that is non-trivial to be analytically predetermined.
Agent-based systems have long been used to achieve dynamic customisation, improved quality, reliability, and flexibility in industrial settings [38]. Agent-based solutions are particularly suited for complex manufacturing processes since they can offer decentralised management and process control functions, coupled with intelligent cooperation and synchronisation capabilities. Such solutions typically involve cooperative multi-agent architectures [39] and have been largely employed for the integration of design, manufacturing and shop-floor control activities, as well as for manufacturing process monitoring and optimisation [40].
For agents to be able to reason and communicate with each other, particularly in open environments, explicit application domain models are needed. To this purpose, ontologies are commonly employed for agent communication since they can explicitly represent domain concepts, relationships between them, as well as domain rules and their semantics. This is particularly relevant in industrial agent applications since the execution of industrial processes should depend not only on their internal state and on user interactions, but also on the context of their execution [41]. Therefore, ontologies are a particularly suitable approach for representing manufacturing knowledge in a machine-interpretable manner.
Agent-based architectures have been successfully used to ensure the integrity of stored data [42]. The approach typically involves having agents periodically testing the integrity of specified data volumes, e.g., by regularly verifying the hash value of stored data files. Furthermore, agents are commonly used to maintain data integrity in conjunction with blockchain infrastructures, for example by using smart contracts to monitor file hash values [43] and by using agents to eventually implement efficient deduplication of the blockchain stored data [44].
Along these lines, agents are used in SPuMoNI to form an intelligent data gathering and processing mechanism where incoming data are matched with an extended pharma manufacturing ontology providing suitable constructs for explicitly representing ALCOA compliance information, and initial data integrity checks are performed closed to the data source to ensure early notification and reactions by the system operators. Furthermore, the integrity and end-to-end verification of the generated batch number records and intermediate process data is ensured by storing relevant medicament data. Finally, agents are used to execute collective pattern prediction models which enable reaction to forthcoming process data discrepancies before they are actually recorded and affect production performance.
The SPuMoNI framework is using agents to smartly extract, transform, and control diverse heterogeneous data sources generated at distinct points within the manufacturing process and, subsequently, ensure the integrity of the generated data records.

Related work
Disclosure risk assessment techniques in pharmaceutical manufacturing typically depend on background knowledge, the behaviour of intruders and the specific value of the data. Often only heuristic arguments are used without numerical assessment [45]. In this context, there is no pharma-related literature, to the best of our knowledge, which couple blockchain and smart contracts with MAS and data quality mechanisms for ALCOA principles. SPuMoNI approach will enable to track the medicines manufacturing process and ensure that all generated data is ALCOA compliant. However, the literature shows that some research in pharmaceutical has been conducted to guarantee transparency and traceability.
Blockchain and smart contracts SPuMoNI is particularly timely as blockchain has been recently proposed to become "a new Digital Service Infrastructure" for Europe [46]. In this domain, blockchain has been explored mainly to be a distributed authority in the supply chain process which typically include the manufacturer, the wholesaler, and the retailer [47][48][49]. Furthermore, blockchainbased smart contracts are used to define the relationships between the participants and to ensure traceability through an interorganisational business process [50].

Intelligent agents
The explosive growth of manufacturing data has resulted in the proliferation of intelligent data analytics systems that are based on intelligent agents. Such systems are typically divided in layers, e.g., as Tang et al. (2018) [39], learn and adapt to dynamic environments [51], and continuously analyse incoming data aiming to optimise manufacturing processes [52]. Along this line, recent attempts have focused on integrating agents with blockchain technologies aiming to ensure accountability and trusted interactions [53]. Specifically in the pharmaceutical industry, solutions based on blockchain and agents have been developed for the coordination of logistic services [54].
Agents are commonly used in conjunction with ontologies to manage data and reduce complexity of pharmaceutical manufacturing processes. Cao et al. (2018) [55] propose an ontological information infrastructure which integrates data within pharmaceutical manufacturing plants, based on the ANSI/ISA-88.01 batch control standard. Similarly, a tablet production ontology for a generic drug tablet production expert system is described by Chalortham et al. (2008) [56]. Furthermore, an ontological information infrastructure aiming to reduce pharmaceutical process development time and achieve better quality assurance is discussed by Haile-  [58].
Finally, ontologies have been employed to model and ensure compliance of pharmaceutical manufacturing processes to international standards and regulations. Sesen et al. [59] introduce an ontological infrastructure to support decision making for pharmaceutical regulatory compliance. The proposed system, termed On-toReg, is integrated with a reasoner and a Java rule engine.
However, none of the above approaches specifically cater for regulatory and ALCOA compliance of the pharmaceutical manufacturing process data.

Contributions
Blockchain technologies in pharmaceutical manufacturing lines together with data quality assurance and intelligent agents is an emerging research topic. Table 1 provides a comparison of pharmarelated blockchain applications which have been applied mainly to ensure reliability in supply chain. In this context, the SPuMoNI project contributes with an innovative solution coupling MAS, blockchain technology, and data quality assurance. Mackey et al. [47] Supply chain Helo and Hao [48] Supply chain Bocek et al. [49] Supply chain Casado-Vara et al. [54] Logistics

SPuMoNI Proposal
Manufacturing lines Agents initially match collected data with suitable data descriptions based on a pharmaceutical ontology so as to provide for AL-COA compliance. For example, in temperature measurements taken from a given machine, the actual temperature values are bundled together with additional information concerning the method of obtaining the data (e.g. sensor id and frequency of measurements). Furthermore, agents convert data to a unified format, such as ensuring that all measurements have the same number of decimal digits.
In addition to data manipulations due to their inherent distribution and flexibility, agents very effectively perform online preliminary data integrity checks close to the data source during data collection. In this way both data integrity issues can be discovered early, the appropriate users can be subsequently notified and appropriate actions can be taken immediately, reducing thus the costs incurred.
Then, the novel probability distribution methods verify the AL-COA compliance of manufacturing records data. On the one hand, each ALCOA principle is quantified through DQA outcomes among each batch record. On the other hand, temporal variability and multi-source variability among manufacturing data repositories are evaluated. These results may uncover patterns, clusters, or deviations; and therefore supporting pharmaceutical companies and external auditory in decision-making.
Blockchain and smart contracts ensure traceability and end-toend verification. Moreover, the immutability of blockchain arguably enables SPuMoNI datasets to remain unchanged and unaltered, thus supporting the ALCOA principles and, ultimately, data integrity. Additionally, the sensitive nature of pharma-related data is benefiting from the use of blockchain with its high level of security. Nevertheless, to mitigate security threats, SPuMoNI has enforced a private Ethereum network fully hosted and managed within the consortium. Although PoA is widely considered suitable for private networks requiring an identity validation of all nodes, our private Ethereum network has been evaluated with both PoW and PoA. By using the "Ether"-Ethereum own currency-SPuMoNI will eventually monitor and assign the processing costs in a future commercial version of the system.

Architectural specification
This section introduces the architectural specification of the intelligent agents and ALCOA compliant mechanisms to trace the pharmaceutical process and ensure its quality, transparency and authenticity. Our approach to medicament manufacturing traceability within SPuMoNI is predicated on three data-intensive pillars, whose interactions are represented in Fig. 1: 1. End-to-End Verification 2. Data Quality 3. Intelligent Data Analytics

End-to-end verification
Pharmaceutical manufacturing requires end-to-end verification where each step of the process should be recorded and documented. Therefore, SPuMoNI has created a novel blockchain-based end-to-end verification system for medicament manufacturing to serve two purposes: (i) marshalling data generation within production processes; and (ii) serving as an audit mechanism for any operations that are performed on any data set within the system. Both purposes require a system of processing, analysing and accessing data transparent, auditable, and secure. Therefore, the SPuMoNI data records are stored within a distributed ledger preventing manipulative and coercive activities.
Due to different GDPR and pharmaceutical industry regulations, SPuMoNI uses an Ethereum private network. This private network is mainly used by the project consortium to store data which should not be visible to the outside community. It acts as a distributed database containing private data where the access is permission-based. In addition, in Ethereum private networks, it is possible to manage the price of transactions and allocate ethers, i.e., Ethereum coins to the network accounts.
The SPuMoNI Ethereum network deployment is composed of virtual machines (nodes) which mine and approve all data transactions using consensus algorithms. These consensus algorithms are used to achieve agreement concerning transaction values from the manufacturing production lines and their provenance. Currently, SPuMoNI Ethereum network has been tested using two consensus algorithms, namely PoW and PoA, to enable the empirical evaluation of the system performance. SPuMoNI system intends to fully marshal ALCOA principles in the pharmaceutical manufacturing process using Blockchain and smart contracts.
At this stage, some asset-centric security threats have been considered, essentially to ensure the ALCOA compliance for data integrity. Since the majority of blockchain attacks are at the network level [60], we have already taken into account threats to the mining process and to the network. To mitigate those threats and due to the sensitive level of SPuMoNI data, the current SPuMoNI version uses a private Ethereum network hosted and managed by the consortium. The private Ethereum network is composed of two nodes as miners. The consortium manages the access control to the private network as well as the users authentication and their role assignment. Additionally, PoA ensure that, to become a network miner, the existing miners should vote in order to prove that the new miner's identity is true and reliable. The PoA configuration requires a "master" miner responsible for adding new miners because we intend to keep the blockchain network comparatively small and fully private. Therefore, being hosted and managed by a private entity, it avoids dishonest miners that may carry out network-related attacks.

Data quality assurance
The data quality assurance aims to verify the ALCOA compliance of the data originated throughout the manufacturing process. This analysis encompasses single and multiple batch evaluation analysis. While the single batch evaluation intends to assess each ALCOA principle of the corresponding batch, the multiple evaluation includes a temporal and multi-source variability characterisation of both the ALCOA principles and the specific variables of manufacturing sensors.
Fareva (IDA) has provided pharmaceutical manufacturing reports which will enable the quantification of ALCOA principles using data from a real industrial environment.
It is duly noted that the IDA IT environment and their regulated processes have been inspected and audited by different pharmaceutical governmental agencies in Europe and abroad, ergo the SPuMoNI data sets on which this work is based upon are highly representative. Fig. 2 summarises the report structure through an Entity-Relationship Diagram (ERD) using the Unified Modelling Language (UML) notation. These reports describe the entire manufacturing process of a pharmaceutical batch record. The process is composed of several stages with multiple instructions according to the corresponding medicine recipe. Additionally, these reports provide the manufacturing data records created throughout the entire process. DQA evaluates and quantifies each ALCOA principle as temporal and multi-source variability (see Section 2.3) among the manufacturing data records.

Intelligent data analytics
The SPuMoNI MAS can be viewed as forming a cognition layer on the top of the manufacturing process supporting data extraction and process control. Agents are organised in a distributed software architecture highly coupled with the SPuMoNI blockchain infrastructure. The SPuMoNI approach includes a 3-layer system architecture to address both pre-processing control and data management of pharma manufacturing processes. Input data is collectively gathered and processed based on distributed interactions of agents that act as wrappers on heterogeneous data sources.
Data flows towards the upper layers, and the information is aggregated, analysed, and visualised as needed. An example of the main SPuMoNI system dashboard depicting outlier of the executing process is shown in Fig. 3.
Apart from data sensing, management and fusion, the responsibilities of agent components include data analysis and prediction. The approach followed is to use Distributed Artificial Intelligence techniques to find patterns that may lead to deviations of manufacturing process data and send an alert before actual deviations are recorded. Among others, biologically inspired deep learning models, such as Spiking Neural Networks for instance [61], are used to drive prediction of deviations and produce alerts for multivariate conditions that change dynamically over time. In this way, prediction models can be constructed on the combined set of parameters to reveal collective patterns that might result in deviations, even when currently all parameters are within limits. This approach, combined with the multi-source variability models mentioned in Section 5.4, ensures that manufacturing processes will produce the best drug quality.

Evaluation
SPuMoNI encompasses three data-intensive pillars: (i) end-toend verification supported by a blockchain infrastructure; (ii) data quality assurance to evaluate the ALCOA compliance; and (iii) intelligent data analytics managed by a MAS. Specifically, to evaluate the blockchain infrastructure, we have carried out experiments using industry-grade pharma data sets whose manufacturing lines are depicted in Fig. 5.

Blockchain infrastructure
Our blockchain infrastructure is composed of a private Ethereum network hosted at National College of Ireland's OpenStack private cloud. It uses go-ethereum 2 as Ethereum client, Web3J 3 as Java Application Programming Interface, and Solidity 4 as a smart contract language. This infrastructure has been tested with the Java Development Kit version 13.0.2 and deployed in the OpenStack platform (Train release). Each OpenStack instance has 16 GB in RAM, 8 CPU and 160 GB of hard-disk space. New nodes can be dynamically added to the network because it allows the synchronisation of the entire chain hold by the remaining nodes. Fig. 4 contains a high-level diagram of the SPuMoNI blockchain infrastructure. We have employed this infrastructure to evaluate the performance using PoW and PoA consensus algorithms. Since PoW requires a mining time between 15-20 seconds, we have configured PoA with a period of 12 seconds for comparison purposes. Additionally, the private Ethereum network has been configured with a block gas limit of 0x8000000 providing cost-free processing, i.e., the transactions are submitted using 0 as gas price. We have carried out all the experiments using industry-grade pharma data sets.
Data set The data sets contain time series from two different machines from pharmaceutical manufacturing product lines. Each machine generates raw data from non-stop sensors. Specifically, the data set contains values from six different sensors and the corresponding timestamp. Specifically, each blockchain transaction is composed of: (i) a batch or asset identifier (ID); (ii) timestamp of the sensors data records; and (iii) six values of multiple sensors. Our experiments involve 103 403 transactions. Performance results The performance was assessed using latency and throughput using benchmarking best practices [16,62] as follows: Throughput (transactions per second (tps)) to determine the number of successful blockchain transactions per time unit. It is the number of successful transactions per second. Latency (seconds (s)) to establish the time delay between the submission and completion of a blockchain transaction. It is the difference between the submission and completion of a transaction.
For a set of transactions, the average of latency and throughput corresponds to the average of all transactions in the data set. Table 2 presents the SPuMoNI blockchain network performance evaluation using the industrial pharma-related data where μ is the average and σ is the standard deviation of latency and through-put concerning all transactions in the data set submitted to the blockchain. The network was tested with the minimum resources, i.e., the transactions are received by one node in order to verify the minimum latency and throughput of the network for both consensus algorithms.
The results indicate a better performance of PoA. Moreover, it provides more stable throughput values once the σ of mining time is 0. In this context, we have adopted PoA within the SPuMoNI blockchain infrastructure. Despite the cost-free processing in the current Ethereum settings, we are using Ethereum since SPuMoNI project intends to create a commercial product for pharmaceutical industry using Ethereum currency to manipulate the costs.

Pharmaceutical manufacturing lines
As illustrated in Fig. 5, any major pharmaceutical plant involves multiple manufacturing lines structured in independent packaging    In summary, the IN-OUT materials handling flows encompass the methods, equipment, and systems for conveying materials from the incoming bays to various machines and processing areas, and for transferring finished parts to assembly, packaging and warehouse and, ultimately, to shipping areas. Finally, the production stages highlighted in pink in Fig. 5 typically entail dispensing, Pharma production lines generate large amounts of data from non-stop sensors located at multiple industrial machines and instruments. As in any real big data context, the pharma-related data repositories are large, variable, and heterogeneous, but critical to the pharmaceutical industry, data must be traceable, auditable, and ALCOA compliant. To approach this problem, the SPuMoNI architecture is composed of: (i) MAS for data management; (ii) data quality assurance for ALCOA compliance; and (iii) a blockchainbased infrastructure to ensure data authenticity, immutability, and traceability.
Sourced from their validated ICT environment, Fareva (IDA) has provided fully-anonymised manufacturing reports and raw timeseries data concerning the process data and the environmental conditions for the generation of multiple batches of distinct medicament recipes. They have enabled the simulation of agent interaction and the corresponding validation of the data quality and ALCOA compliance from the manufacturing processes underpinned by blockchain technologies and smart contracts.

Multi-agent system
MAS aims to ensure the batch number validity and guarantee ALCOA compliant data records. To this purpose, the system generates reports providing a global view of the manufacturing environment and enabling prompt corrective actions. Moreover, agents perform a-priori ALCOA compliance tests and complex data operations such as high-dimensional heterogeneous data management, and adaptive and evolving data classification. Fig. 6 exemplifies the batch outlier detection within the SPuMoNI framework. Upon finalising the respective deep learning and prediction algorithms the SPuMoNI system will be tested and evaluated in pilot scenarios using real-world data provided by IDA-Fareva.

Data quality assurance
The current data quality analysis module is designed for evaluating each ALCOA principle compliance among the manufacturing records but also includes a temporal and multi-source variability characterisation of the manufacturing data. In order to measure the magnitude of changes, these metrics facilitate its comparability on different domains. An example of this variability characterisation is shown in Fig. 7.
The data quality module of SPuMoNI system has initially been evaluated and validated through the analysis of a set of retrospective pharmaceutical manufacturing reports described in Section 4. The evaluation process consists in scoring ALCOA principles among both a single report and multiples reports as well as analysing the temporal and multi-source variability of relevant variables. Once retrospective data is evaluated, we will evaluate the data quality module in a pilot, in order to evaluate its performance.

Conclusions and further work
Pharmaceutical manufacturing lines are composed of multiple automated systems, some embedded, which control the medicine production generating a large amount of data. This data must be collected and maintained without compromising its integrity which is ensured by the ALCOA principles. In this context, the pharmaceutical industry is consistently looking for effective technological solutions to improve the manufacturing process in terms of ALCOA compliance. Therefore, these solutions must ensure data integrity and end-to-end traceability of medicine production. SPuMoNI has started to address this problem by building a decentralised system based on intelligent agents and data quality mechanisms upon blockchain technology.
The data integrity is evaluated by data quality assurance methods. These methods evaluate if the data records created and managed by manufacturing lines are ALCOA compliant, i.e. this mod-  ule includes methods to analyse the temporal and multi-source variability. However, the retrospective reports do not have fullyvalidated ALCOA scores. This validation process will be executed by the project consortium pharmaceutical industry experts (Fareva (IDA) and PQE) who will evaluate a set of qualitative and quantitative metrics to validate the data quality module results. Additionally, these experts will supervise the variability analysis approaches in order to achieve a set of functionalities that will have significant impact in pharma manufacturing environments.
The SPuMoNI agent module includes dedicated agents that collectively implement intelligent data management, regulatory compliance verification and predictive analytics. Eventually, the sensing agents will collect data from non-stop sensors managing large amounts of data. Additional agent types collect data from other databases and construct batch number records, interact with the blockchain infrastructure, and perform real-time inference and prediction analytics. Agents use ontologies to represent and communicate domain knowledge, and bio-inspired algorithms for collective learning and decentralised data processing. Furthermore, the SPuMoNI MAS will furnish intelligent process monitoring and predictive analysis of the pre-processed data. Specific requirements for manufacturing process monitoring, control and optimisation will be considered, such as identification and selection of informative attributes in high-dimensional data, adaptive and evolving data classification, and dynamic event prediction. The decentralisation of data processing will enable end-to-end pharmaceutical process monitoring and alert generation, such as 'outlier' detection, in a timely fashion.
Moreover, various bio-inspired data analytics approaches [63], are currently investigated, aiming to predict future process data quality problems based on current and historical process data. Such analyses will eventually transcend manufacturing lines and medicament manufacturers, and they will cover both historical non-compliance anomalies, and estimations of future noncompliance risks. To this purpose, Quality Risk Management (QRM) techniques will be used, so as to be compliant with Quality by design (QbD) standards and good manufacturing practices (GMP), such as the GAMP v-model [64].
Blockchain and smart contracts underpin data quality assurance for manufacturing processes and enable communication with and between intelligent agents to perform decision making. The blockchain characteristics ensure the reliability of transactions using consensus algorithms. Given the big data scenario of the pharmaceutical industry, the SPuMoNI system is managing a large number of transactions and blockchain infrastructure is benchmarked against the PoW and PoA consensus algorithm performance. As expected, the results show that PoA provides better latency as well as less power and energy consumption.
Initially, the SPuMoNI blockchain ledger is acting as an audit and operational table. Nonetheless, it will be eventually extended to specific areas of zero-knowledge proof for very sensitive data in communication with third parties, which is not in place in blockchain solutions at the moment. Through blockchain distributed consensus mechanisms, this will also hinder malicious actors falsifying and duplicating key information artefacts.
In this paper, we have presented the first version of SPuMoNI implementation. The project aims to reach a Technology Readiness Level of 7-system prototype demonstration in operational environment. The pharmaceutical company has provided manufacturing data sets and streams from real production lines. In this context, as near future work we intend to interconnect the three modules of the project (agents, data quality, and blockchain) simulating the real environment. SPuMoNI solution is bringing reliability and transparency to pharmaceutical industry where compliance and risk assessment are critical to maintain its reputation and, ultimately, to save lives allowing very large availability of quality proven products packaged according to any country drug agency requirements.