Trends in Biotechnology
Volume 19, Issue 2, 1 February 2001, Pages 53-62
Journal home page for Trends in Biotechnology

Review
Multivariate statistical monitoring of batch processes: an industrial case study of fermentation supervision

https://doi.org/10.1016/S0167-7799(00)01528-6Get rights and content

Abstract

This article describes the development of Multivariate Statistical Process Control (MSPC) procedures for monitoring batch processes and demonstrates its application with respect to industrial tylosin biosynthesis. Currently, the main fermentation phase is monitored using univariate statistical process control principles implemented within the G2 real-time expert system package. This development addresses integrating various process stages into a monitoring system and observing interactions among individual variables through the use of multivariate projection methods. The benefits of this approach will be discussed from an industrial perspective.

Section snippets

Process description and data availability

Tylosin fermentation has been chosen as an example of a complex secondary metabolite production process. Tylosin production, as with most fermentation processes, involves various stages before and subsequent to the main fermentation, which has traditionally been the favoured area for improvements. However, deviations that influence final productivity might occur before the final stage and therefore it is clearly beneficial to focus on preceding operations of the fermentation. Tylosin production

Principles of MSPC

The principles MSPC are published widely 4., 5., 6. and therefore only a brief summary will be given here. PCA involves finding the eigenvalues of the sample covariance matrix that are the variances of the principal components. For a normalized (mean centred, variance scaled) sample matrix X [n, m] with n samples and m variables, PCA will find m uncorrelated new variables, the variance of which decreases from first to last. Let the new variables be represented by ti for a particular sample i as

MSPC for batch processes

Batch data differs from continuous data in that the problem is now 3D: the added dimension being that of time (k) besides the number of variables logged (m) and the number of batches (n). The ideas of manipulating batch data in a certain and meaningful fashion originate from Nomikos and MacGregor 8., 9.. They suggested a simple way to view batch data as a 3D data matrix constructed from layers of batches stacked onto each other (X [n, m, k]), that can be unfolded into 2D arrays in two different

Model building for fault detection and diagnosis

Process data from 144 batches were collected comprising 17 on-line variables recorded hourly during the main fermentation (∼140 hours) and 53 off-line variables recorded from operations preceding and during the fermentation stage (X). The final yield was used as an indicator of overall process performance (Y). The 65 batches with high yield were assumed to represent desirable operation (normal) and were used for model development. The 44 batches resulted in low yield, but no abnormalities could

Model building for performance estimation: Principal Component Regression

It is often interesting to relate readily available variables to process parameters that are difficult to measure by using predictive models to provide estimates. Although estimation algorithms have evolved rapidly over the last decades, they are only beginning to find their way into bioprocess applications and often traditional multiple linear regression techniques are still in use despite computational difficulties when dealing with correlated data 11. Numerical problems are resolved by PCA

Variable influences

Another frequently requested task towards modelling exercises to identify variables that are major process drivers. Data from an industrial environment usually cannot deliver suitable data to understand variable influences because the variables are strictly controlled and therefore their effects cannot be investigated. The most appropriate knowledge about variable influences originates from fundamental process understanding or designed experimentation. In fact, prior knowledge concerning

MSPC in on-line monitoring of batch processes

The off-line analysis of historical data provided a useful tool for learning from data and suggested that multivariate transformation potentially outperform univariate process monitoring. Real benefit only results if the above process deviations are detected and diagnosed in real time through the development of on-line control limits on the scores and SPE statistics with their contributions.

The online control limits encapsulate natural process variation at each time interval, which is assumed

MSPC in performance forecasting

It would be of quantifiable benefit to estimate the eventual performance at early stages of the fermentation, because no performance measure is available until the batch terminates. Nomikos and MacGregor 12 proposed methodology for using MPCA approaches in forecasting final batch performance. These approaches are based on assuming certain process behaviour for the rest of batch duration, such as: (1) average behaviour of runs used for model building; (2) persistent deviations from average

Implementation

A prototype off-line package was developed (Matlab) to perform various data pre-processing tasks, such as data screening for missing values, outliers and noise and input selection. The graphical user interfaces allow the user to easily create, save and re-use multivariate models, to visually present the models and data, and various statistics. The off-line tool includes a simulation of the on-line scenario, whereby the user can ‘play back’ the on-line multivariate monitoring charts of any

Discussion and industrial perspectives

Although the application of data based techniques is appealing because of the relatively low resource requirements and rapid model development times, one of the main lessons learnt through this application is the significance of representative data. It appears to be a general feeling that raw information from industrial instrumentation does not provide sufficient insights into microbial behaviour. Although data quality is difficult to define, it is vital to the success of data based

Acknowledgements

The authors would like to express their appreciation to S. Martin for sharing his knowledge about tylosin fermentation, D. Keates and D. Range for the on-line implementation, P. Mohan for managing this project, and all colleagues at Eli Lilly and Company Ltd who made this development possible.

Glossary

Expert Systems
A supervisory control system that makes use of expert knowledge in the form of rules to advice operators on process problems and control actions. The simple logic of associative decision rules are easily understood and accepted by humans, therefore often ‘if-then’ rules are used in the above context in the form of ‘If X is true and Y is false, conclude class 1’.
Interactions
Chemical and biological systems often highly complex and present complex, non-linear and dynamic dependence

References (16)

  • J. Glassey

    Issues in industrial advisory system development

    Trends Biotech.

    (2000)
  • E.B. Martin et al.

    Non-parametric confidence bounds for process performance monitoring charts

    J. Process Control

    (1996)
  • Montgomery, D.C. (1996)Introduction to Statistical Quality Control, Wiley &...
  • Duke, P. (1992) KAT –A Knowledge Acquisition Technique, Methodology Manual, CK Design,...
  • I.T. Joliffe

    Principal Component Analysis

    (1986)
  • B.M. Wise

    A theoretical basis for the use of principal components models for monitoring multivariate processes

    Process Control and Quality

    (1990)
  • MacGregor, J.F. (1994) Statistical Process Control of Multivariate Processes. IFAC World Congress, 1994, Kyoto,...
  • S. Geman

    Neural networks and bias/variance dilemma

    Neural Computation

    (1992)
There are more references available in the full text version of this article.

Cited by (79)

  • Investigation of weight loss in mozzarella cheese using NIR predicted chemical composition and multivariate analysis

    2021, Journal of Food Composition and Analysis
    Citation Excerpt :

    For example, unexpected values in total Ca of curd can be easily tracked with LR and can inform technician of a possible problem for the shelf-life of the product (Mason et al., 1997). Finally, only PCR allows to rapidly check variability in parameters known to directly influence Ca content of curd, such as curd pH and moisture content, and to correct them (Albert and Kinley, 2001). In the present paper, PCR and LR coupled with variable selection were applied to evaluate and predict process performance in high-moisture mozzarella manufacturing.

  • Industrial batch process monitoring with limited data

    2019, Journal of Process Control
    Citation Excerpt :

    The deployment of BPM along with other on-line, at-line, and in-line sensors and analyzers in biotechnology have received significant attention in recent years [2,11–13]. In fact successful applications of BPM in biomanufacturing include monitoring of microbial fermentation reactors [14–16], cell culture [17,18] and purification processes [19,20]. Despite over two decades of research and advancement in BPM, the BPM framework in biomanufacturing suffers from a unique challenge – the ‘Low-N’ problem (or small-data problem).

  • In-Depth Evaluation of Data Collected During a Continuous Pharmaceutical Manufacturing Process: A Multivariate Statistical Process Monitoring Approach

    2019, Journal of Pharmaceutical Sciences
    Citation Excerpt :

    Batch SPM (BSPM) is the extension of MSPM suitable for the monitoring of batch processes.12 Several examples of the application of MSPM or BSPM can be found in literature for the monitoring of analytical methods,13,14 bioprocesses,15,16 chemical processes,10,17,18 and processes from several other industries such as the food,19 paper,20 metallurgical,21 and petrochemical22 industries, among others. However, few reports exist for MSPM/BSPM of pharmaceutical production processes.

  • A review of control strategies for manipulating the feed rate in fed-batch fermentation processes

    2017, Journal of Biotechnology
    Citation Excerpt :

    The Hotelling T2 statistic can be used to measure the variability on the model, and the squared prediction error (SPE) evaluates the error between the model and the data values. Although there are many literature examples where multivariate methods are applied for process modelling and monitoring (Doan and Srinivasan, 2008; Ferreira et al., 2007; Glassey, 2013; Mears et al., 2016), there are few examples found where statistical methods are directly applied for control (Albert and Kinley, 2001; Duran-Villalobos et al., 2016). This may be due to some of the disadvantages of the method.

View all citing articles on Scopus
View full text