Multivariate statistical monitoring of batch processes: an industrial case study of fermentation supervision

doi:10.1016/S0167-7799(00)01528-6

Trends in Biotechnology

Volume 19, Issue 2, 1 February 2001, Pages 53-62

https://doi.org/10.1016/S0167-7799(00)01528-6 Get rights and content

Abstract

This article describes the development of Multivariate Statistical Process Control (MSPC) procedures for monitoring batch processes and demonstrates its application with respect to industrial tylosin biosynthesis. Currently, the main fermentation phase is monitored using univariate statistical process control principles implemented within the G2 real-time expert system package. This development addresses integrating various process stages into a monitoring system and observing interactions among individual variables through the use of multivariate projection methods. The benefits of this approach will be discussed from an industrial perspective.

Section snippets

Process description and data availability

Tylosin fermentation has been chosen as an example of a complex secondary metabolite production process. Tylosin production, as with most fermentation processes, involves various stages before and subsequent to the main fermentation, which has traditionally been the favoured area for improvements. However, deviations that influence final productivity might occur before the final stage and therefore it is clearly beneficial to focus on preceding operations of the fermentation. Tylosin production

Principles of MSPC

The principles MSPC are published widely 4., 5., 6. and therefore only a brief summary will be given here. PCA involves finding the eigenvalues of the sample covariance matrix that are the variances of the principal components. For a normalized (mean centred, variance scaled) sample matrix X [n, m] with n samples and m variables, PCA will find m uncorrelated new variables, the variance of which decreases from first to last. Let the new variables be represented by t_i for a particular sample i as

MSPC for batch processes

Batch data differs from continuous data in that the problem is now 3D: the added dimension being that of time (k) besides the number of variables logged (m) and the number of batches (n). The ideas of manipulating batch data in a certain and meaningful fashion originate from Nomikos and MacGregor 8., 9.. They suggested a simple way to view batch data as a 3D data matrix constructed from layers of batches stacked onto each other (X [n, m, k]), that can be unfolded into 2D arrays in two different

Model building for fault detection and diagnosis

Process data from 144 batches were collected comprising 17 on-line variables recorded hourly during the main fermentation (∼140 hours) and 53 off-line variables recorded from operations preceding and during the fermentation stage (X). The final yield was used as an indicator of overall process performance (Y). The 65 batches with high yield were assumed to represent desirable operation (normal) and were used for model development. The 44 batches resulted in low yield, but no abnormalities could

Model building for performance estimation: Principal Component Regression

It is often interesting to relate readily available variables to process parameters that are difficult to measure by using predictive models to provide estimates. Although estimation algorithms have evolved rapidly over the last decades, they are only beginning to find their way into bioprocess applications and often traditional multiple linear regression techniques are still in use despite computational difficulties when dealing with correlated data ¹¹. Numerical problems are resolved by PCA

Variable influences

Another frequently requested task towards modelling exercises to identify variables that are major process drivers. Data from an industrial environment usually cannot deliver suitable data to understand variable influences because the variables are strictly controlled and therefore their effects cannot be investigated. The most appropriate knowledge about variable influences originates from fundamental process understanding or designed experimentation. In fact, prior knowledge concerning

MSPC in on-line monitoring of batch processes

The off-line analysis of historical data provided a useful tool for learning from data and suggested that multivariate transformation potentially outperform univariate process monitoring. Real benefit only results if the above process deviations are detected and diagnosed in real time through the development of on-line control limits on the scores and SPE statistics with their contributions.

The online control limits encapsulate natural process variation at each time interval, which is assumed

MSPC in performance forecasting

It would be of quantifiable benefit to estimate the eventual performance at early stages of the fermentation, because no performance measure is available until the batch terminates. Nomikos and MacGregor ¹² proposed methodology for using MPCA approaches in forecasting final batch performance. These approaches are based on assuming certain process behaviour for the rest of batch duration, such as: (1) average behaviour of runs used for model building; (2) persistent deviations from average

Implementation

A prototype off-line package was developed (Matlab) to perform various data pre-processing tasks, such as data screening for missing values, outliers and noise and input selection. The graphical user interfaces allow the user to easily create, save and re-use multivariate models, to visually present the models and data, and various statistics. The off-line tool includes a simulation of the on-line scenario, whereby the user can ‘play back’ the on-line multivariate monitoring charts of any

Discussion and industrial perspectives

Although the application of data based techniques is appealing because of the relatively low resource requirements and rapid model development times, one of the main lessons learnt through this application is the significance of representative data. It appears to be a general feeling that raw information from industrial instrumentation does not provide sufficient insights into microbial behaviour. Although data quality is difficult to define, it is vital to the success of data based

Acknowledgements

The authors would like to express their appreciation to S. Martin for sharing his knowledge about tylosin fermentation, D. Keates and D. Range for the on-line implementation, P. Mohan for managing this project, and all colleagues at Eli Lilly and Company Ltd who made this development possible.

Glossary

Expert Systems: A supervisory control system that makes use of expert knowledge in the form of rules to advice operators on process problems and control actions. The simple logic of associative decision rules are easily understood and accepted by humans, therefore often ‘if-then’ rules are used in the above context in the form of ‘If X is true and Y is false, conclude class 1’.
Interactions: Chemical and biological systems often highly complex and present complex, non-linear and dynamic dependence

References (16)

J. Glassey
Issues in industrial advisory system development
Trends Biotech.
(2000)
E.B. Martin et al.
Non-parametric confidence bounds for process performance monitoring charts
J. Process Control
(1996)
Montgomery, D.C. (1996)Introduction to Statistical Quality Control, Wiley &...
Duke, P. (1992) KAT –A Knowledge Acquisition Technique, Methodology Manual, CK Design,...
I.T. Joliffe
Principal Component Analysis
(1986)
B.M. Wise
A theoretical basis for the use of principal components models for monitoring multivariate processes
Process Control and Quality
(1990)
MacGregor, J.F. (1994) Statistical Process Control of Multivariate Processes. IFAC World Congress, 1994, Kyoto,...
S. Geman
Neural networks and bias/variance dilemma
Neural Computation
(1992)

There are more references available in the full text version of this article.

Cited by (79)

Investigation of weight loss in mozzarella cheese using NIR predicted chemical composition and multivariate analysis
2021, Journal of Food Composition and Analysis
Citation Excerpt :
For example, unexpected values in total Ca of curd can be easily tracked with LR and can inform technician of a possible problem for the shelf-life of the product (Mason et al., 1997). Finally, only PCR allows to rapidly check variability in parameters known to directly influence Ca content of curd, such as curd pH and moisture content, and to correct them (Albert and Kinley, 2001). In the present paper, PCR and LR coupled with variable selection were applied to evaluate and predict process performance in high-moisture mozzarella manufacturing.
The objective of the present study was to develop multivariate process monitoring models for weight loss of high-moisture mozzarella cheese during shelf-life. Eighty-two production batches from an industrial cheese factory were sampled in four non-consecutive days. Chemical composition of curd and mozzarella was analyzed using near infrared spectroscopy. Weight loss of mozzarella balls at 10 and 21 days after production were evaluated as measure of process efficiency. Principal component regression and linear regression coupled with variable selection were tested for their accuracy in predicting weight loss. Analysis of variance highlighted that shelf-life and sampling day had the strongest effect on mozzarella cheese chemical composition. Weight loss was predicted from curd and mozzarella composition with coefficient of determination ranging from 0.49 to 0.54. Batches with unsatisfactory performances were determined with accuracy ranging between 0.81 and 0.84. Possible applications of proposed models were evaluated based on their performances as well on their usability in dairies. Overall, findings suggest that the most suitable method for routine process control is linear regression algorithm on selected variables.
History and Evolution of Modeling in Biotechnology: Modeling & Simulation, Application and Hardware Performance
2020, Computational and Structural Biotechnology Journal
Biological systems are typically composed of highly interconnected subunits and possess an inherent complexity that make monitoring, control and optimization of a bioprocess a challenging task. Today a toolset of modeling techniques can provide guidance in understanding complexity and in meeting those challenges. Over the last four decades, computational performance increased exponentially. This increase in hardware capacity allowed ever more detailed and computationally intensive models approaching a “one-to-one” representation of the biological reality. Fueled by governmental guidelines like the PAT initiative of the FDA, novel soft sensors and techniques were developed in the past to ensure product quality and provide data in real time. The estimation of current process state and prediction of future process course eventually enabled dynamic process control. In this review, past, present and envisioned future of models in biotechnology are compared and discussed with regard to application in process monitoring, control and optimization. In addition, hardware requirements and availability to fit the needs of increasingly more complex models are summarized. The major techniques and diverse approaches of modeling in industrial biotechnology are compared, and current as well as future trends and perspectives are outlined.
Data science tools and applications on the way to Pharma 4.0
2019, Drug Discovery Today
Multiple obstacles are driving the digital transformation of the biopharmaceutical industry. Novel digital techniques, often marketed as ‘Pharma 4.0’, are thought to solve some long-existing obstacles in the biopharma life cycle. Pharma 4.0 concepts, such as cyberphysical systems and dark factories, require data science tools as technological core components. Here, we review current data science applications at various stages of the bioprocess life cycle, including their scopes and data sources. We are convinced that the scope and usefulness of these tools are currently limited by technical and nontechnical problems experienced during their development and deployment. We suggest that the establishment of DevOps mind- and toolsets could improve this situation and would be essential cornerstones in the further development of Pharma 4.0 systems.
Industrial batch process monitoring with limited data
2019, Journal of Process Control
Citation Excerpt :
The deployment of BPM along with other on-line, at-line, and in-line sensors and analyzers in biotechnology have received significant attention in recent years [2,11–13]. In fact successful applications of BPM in biomanufacturing include monitoring of microbial fermentation reactors [14–16], cell culture [17,18] and purification processes [19,20]. Despite over two decades of research and advancement in BPM, the BPM framework in biomanufacturing suffers from a unique challenge – the ‘Low-N’ problem (or small-data problem).
This article addresses the problem of real-time statistical batch process monitoring (BPM) for processes with limited production history; herein, referred to as the ‘Low-N’ problem. The Low-N problem is a longstanding, industry-wide problem in biopharmaceutical manufacturing that challenges the theoretical foundations and practical applicability of the existing BPM platform. In this article, we propose an approach to transition from a Low-N scenario to a Large-N scenario by generating an arbitrarily large number of insilico batch data sets. The proposed method is a combination of hardware exploitation and algorithm development. To this effect, we propose a block-learning method for a Bayesian non-parametric model of a batch process, and then use probabilistic programming to generate an arbitrarily large number of dynamic insilico campaign data sets. The proposed solution not only alleviates the monitoring issues associated with a Low-N scenario, it is also compatible with the industrial BPM framework. To the best of authors’ knowledge, this is the first article that describes a systematic approach to address the small data problem using the tools for large data sets. The efficacy of the proposed solution is elucidated on an industrial biopharmaceutical process.
In-Depth Evaluation of Data Collected During a Continuous Pharmaceutical Manufacturing Process: A Multivariate Statistical Process Monitoring Approach
2019, Journal of Pharmaceutical Sciences
Citation Excerpt :
Batch SPM (BSPM) is the extension of MSPM suitable for the monitoring of batch processes.12 Several examples of the application of MSPM or BSPM can be found in literature for the monitoring of analytical methods,13,14 bioprocesses,15,16 chemical processes,10,17,18 and processes from several other industries such as the food,19 paper,20 metallurgical,21 and petrochemical22 industries, among others. However, few reports exist for MSPM/BSPM of pharmaceutical production processes.
The present work presents an in-depth evaluation of continuously collected data during a twin-screw granulation and drying process performed on a continuous manufacturing line. During operation, the continuous line logs 49 univariate process variables, hence generating a large amount of data. Three identical 5-h continuous manufacturing runs were performed. Multivariate data analysis tools, more specifically latent variable modeling tools such as principal component analysis, were used to extract information from the generated data sets unveiling process trends and drifts. Furthermore, a statistical process monitoring strategy is presented. The approach is based on the application of multivariate statistical process monitoring to model the variables that remain around a steady state.
A review of control strategies for manipulating the feed rate in fed-batch fermentation processes
2017, Journal of Biotechnology
Citation Excerpt :
The Hotelling T2 statistic can be used to measure the variability on the model, and the squared prediction error (SPE) evaluates the error between the model and the data values. Although there are many literature examples where multivariate methods are applied for process modelling and monitoring (Doan and Srinivasan, 2008; Ferreira et al., 2007; Glassey, 2013; Mears et al., 2016), there are few examples found where statistical methods are directly applied for control (Albert and Kinley, 2001; Duran-Villalobos et al., 2016). This may be due to some of the disadvantages of the method.
A majority of industrial fermentation processes are operated in fed-batch mode. In this case, the rate of feed addition to the system is a focus for optimising the process operation, as it directly impacts metabolic activity, as well as directly affecting the volume dynamics in the system. This review covers a range of strategies which have been employed to use the feed rate as a manipulated variable in a control strategy. The feed rate is chosen as the focus for this review, as it is seen that this variable may be used towards many different objectives depending on the process of interest, the characteristics of the strain, or the product being produced, which leads to different drivers for process optimisation. This review summarises the methods, as well as focusing on the different objectives for the controllers, and the choice of measured variables involved in the strategy. The discussion includes a summary of considerations for control strategy development.

View all citing articles on Scopus

View full text

Trends in Biotechnology

ReviewMultivariate statistical monitoring of batch processes: an industrial case study of fermentation supervision

Abstract

Section snippets

Process description and data availability

Principles of MSPC

MSPC for batch processes

Model building for fault detection and diagnosis

Model building for performance estimation: Principal Component Regression

Variable influences

MSPC in on-line monitoring of batch processes

MSPC in performance forecasting

Implementation

Discussion and industrial perspectives

Acknowledgements

Glossary

Trends Biotech.

J. Process Control

Principal Component Analysis

A theoretical basis for the use of principal components models for monitoring multivariate processes

Process Control and Quality

Neural networks and bias/variance dilemma

Neural Computation

Review
Multivariate statistical monitoring of batch processes: an industrial case study of fermentation supervision