Trends in Biotechnology
ReviewMultivariate statistical monitoring of batch processes: an industrial case study of fermentation supervision
Section snippets
Process description and data availability
Tylosin fermentation has been chosen as an example of a complex secondary metabolite production process. Tylosin production, as with most fermentation processes, involves various stages before and subsequent to the main fermentation, which has traditionally been the favoured area for improvements. However, deviations that influence final productivity might occur before the final stage and therefore it is clearly beneficial to focus on preceding operations of the fermentation. Tylosin production
Principles of MSPC
The principles MSPC are published widely 4., 5., 6. and therefore only a brief summary will be given here. PCA involves finding the eigenvalues of the sample covariance matrix that are the variances of the principal components. For a normalized (mean centred, variance scaled) sample matrix X [n, m] with n samples and m variables, PCA will find m uncorrelated new variables, the variance of which decreases from first to last. Let the new variables be represented by ti for a particular sample i as
MSPC for batch processes
Batch data differs from continuous data in that the problem is now 3D: the added dimension being that of time (k) besides the number of variables logged (m) and the number of batches (n). The ideas of manipulating batch data in a certain and meaningful fashion originate from Nomikos and MacGregor 8., 9.. They suggested a simple way to view batch data as a 3D data matrix constructed from layers of batches stacked onto each other (X [n, m, k]), that can be unfolded into 2D arrays in two different
Model building for fault detection and diagnosis
Process data from 144 batches were collected comprising 17 on-line variables recorded hourly during the main fermentation (∼140 hours) and 53 off-line variables recorded from operations preceding and during the fermentation stage (X). The final yield was used as an indicator of overall process performance (Y). The 65 batches with high yield were assumed to represent desirable operation (normal) and were used for model development. The 44 batches resulted in low yield, but no abnormalities could
Model building for performance estimation: Principal Component Regression
It is often interesting to relate readily available variables to process parameters that are difficult to measure by using predictive models to provide estimates. Although estimation algorithms have evolved rapidly over the last decades, they are only beginning to find their way into bioprocess applications and often traditional multiple linear regression techniques are still in use despite computational difficulties when dealing with correlated data 11. Numerical problems are resolved by PCA
Variable influences
Another frequently requested task towards modelling exercises to identify variables that are major process drivers. Data from an industrial environment usually cannot deliver suitable data to understand variable influences because the variables are strictly controlled and therefore their effects cannot be investigated. The most appropriate knowledge about variable influences originates from fundamental process understanding or designed experimentation. In fact, prior knowledge concerning
MSPC in on-line monitoring of batch processes
The off-line analysis of historical data provided a useful tool for learning from data and suggested that multivariate transformation potentially outperform univariate process monitoring. Real benefit only results if the above process deviations are detected and diagnosed in real time through the development of on-line control limits on the scores and SPE statistics with their contributions.
The online control limits encapsulate natural process variation at each time interval, which is assumed
MSPC in performance forecasting
It would be of quantifiable benefit to estimate the eventual performance at early stages of the fermentation, because no performance measure is available until the batch terminates. Nomikos and MacGregor 12 proposed methodology for using MPCA approaches in forecasting final batch performance. These approaches are based on assuming certain process behaviour for the rest of batch duration, such as: (1) average behaviour of runs used for model building; (2) persistent deviations from average
Implementation
A prototype off-line package was developed (Matlab) to perform various data pre-processing tasks, such as data screening for missing values, outliers and noise and input selection. The graphical user interfaces allow the user to easily create, save and re-use multivariate models, to visually present the models and data, and various statistics. The off-line tool includes a simulation of the on-line scenario, whereby the user can ‘play back’ the on-line multivariate monitoring charts of any
Discussion and industrial perspectives
Although the application of data based techniques is appealing because of the relatively low resource requirements and rapid model development times, one of the main lessons learnt through this application is the significance of representative data. It appears to be a general feeling that raw information from industrial instrumentation does not provide sufficient insights into microbial behaviour. Although data quality is difficult to define, it is vital to the success of data based
Acknowledgements
The authors would like to express their appreciation to S. Martin for sharing his knowledge about tylosin fermentation, D. Keates and D. Range for the on-line implementation, P. Mohan for managing this project, and all colleagues at Eli Lilly and Company Ltd who made this development possible.
Glossary
- Expert Systems
- A supervisory control system that makes use of expert knowledge in the form of rules to advice operators on process problems and control actions. The simple logic of associative decision rules are easily understood and accepted by humans, therefore often ‘if-then’ rules are used in the above context in the form of ‘If X is true and Y is false, conclude class 1’.
- Interactions
- Chemical and biological systems often highly complex and present complex, non-linear and dynamic dependence
References (16)
Issues in industrial advisory system development
Trends Biotech.
(2000)- et al.
Non-parametric confidence bounds for process performance monitoring charts
J. Process Control
(1996) - Montgomery, D.C. (1996)Introduction to Statistical Quality Control, Wiley &...
- Duke, P. (1992) KAT –A Knowledge Acquisition Technique, Methodology Manual, CK Design,...
Principal Component Analysis
(1986)A theoretical basis for the use of principal components models for monitoring multivariate processes
Process Control and Quality
(1990)- MacGregor, J.F. (1994) Statistical Process Control of Multivariate Processes. IFAC World Congress, 1994, Kyoto,...
Neural networks and bias/variance dilemma
Neural Computation
(1992)
Cited by (79)
Investigation of weight loss in mozzarella cheese using NIR predicted chemical composition and multivariate analysis
2021, Journal of Food Composition and AnalysisCitation Excerpt :For example, unexpected values in total Ca of curd can be easily tracked with LR and can inform technician of a possible problem for the shelf-life of the product (Mason et al., 1997). Finally, only PCR allows to rapidly check variability in parameters known to directly influence Ca content of curd, such as curd pH and moisture content, and to correct them (Albert and Kinley, 2001). In the present paper, PCR and LR coupled with variable selection were applied to evaluate and predict process performance in high-moisture mozzarella manufacturing.
History and Evolution of Modeling in Biotechnology: Modeling & Simulation, Application and Hardware Performance
2020, Computational and Structural Biotechnology JournalData science tools and applications on the way to Pharma 4.0
2019, Drug Discovery TodayIndustrial batch process monitoring with limited data
2019, Journal of Process ControlCitation Excerpt :The deployment of BPM along with other on-line, at-line, and in-line sensors and analyzers in biotechnology have received significant attention in recent years [2,11–13]. In fact successful applications of BPM in biomanufacturing include monitoring of microbial fermentation reactors [14–16], cell culture [17,18] and purification processes [19,20]. Despite over two decades of research and advancement in BPM, the BPM framework in biomanufacturing suffers from a unique challenge – the ‘Low-N’ problem (or small-data problem).
In-Depth Evaluation of Data Collected During a Continuous Pharmaceutical Manufacturing Process: A Multivariate Statistical Process Monitoring Approach
2019, Journal of Pharmaceutical SciencesCitation Excerpt :Batch SPM (BSPM) is the extension of MSPM suitable for the monitoring of batch processes.12 Several examples of the application of MSPM or BSPM can be found in literature for the monitoring of analytical methods,13,14 bioprocesses,15,16 chemical processes,10,17,18 and processes from several other industries such as the food,19 paper,20 metallurgical,21 and petrochemical22 industries, among others. However, few reports exist for MSPM/BSPM of pharmaceutical production processes.
A review of control strategies for manipulating the feed rate in fed-batch fermentation processes
2017, Journal of BiotechnologyCitation Excerpt :The Hotelling T2 statistic can be used to measure the variability on the model, and the squared prediction error (SPE) evaluates the error between the model and the data values. Although there are many literature examples where multivariate methods are applied for process modelling and monitoring (Doan and Srinivasan, 2008; Ferreira et al., 2007; Glassey, 2013; Mears et al., 2016), there are few examples found where statistical methods are directly applied for control (Albert and Kinley, 2001; Duran-Villalobos et al., 2016). This may be due to some of the disadvantages of the method.