Abstract
The evaporation process is a common step in herbal medicine manufacturing and often lasts for a long time. The degradation of evaporation performance is inevitable, leading to more consumption of steam and electricity, and it may also have an impact on the content of thermosensitive components. Recently, a vast amount of evaporation process data is collected with the aid of industrial information systems, and process knowledge is hidden behind the data. But currently, these data are seldom deeply analyzed. In this work, an exploratory data analysis workflow is proposed to evaluate the evaporation performance and to identify the root causes of the performance degradation. The workflow consists of 6 steps: data collecting, preprocessing, characteristic stage identification, feature extraction, model development and interpretation, and decision making. In the model development and interpretation step, the workflow employs the HDBSCAN clustering algorithm for data annotation and then uses the ccPCA method to compare the differences between clusters for root cause analysis. A full-scale case is presented to verify the effectiveness of the workflow. The evaporation process data of 192 batches in 2018 were collected in the case. Through the steps of the workflow, the features of each batch were extracted, and the batches were clustered into 6 groups. The root causes of the performance degradation were determined as the high Pv,II and high LI by ccPCA. Recommended suggestions for future manufacturing were given according to the results. The proposed workflow can determine the root causes of the evaporation performance degradation.
Similar content being viewed by others
References
Abid, A., Zhang, M. J., Bagaria, V. K., & Zou, J. (2018). Exploring patterns enriched in a dataset with contrastive principal component analysis. Nature Communications, 9, 2134. https://doi.org/10.1038/s41467-018-04608-8
Aljuaid, T., & Sasi, S (2016). Proper imputation techniques for missing values in data sets. In 2016 International Conference on Data Science and Engineering (ICDSE), (pp. 146–150). https://doi.org/10.1109/ICDSE.2016.7823957
Allaoui, M., Kherfi, M. L., Cheriet, A., El Moataz, A., Mammass, D., Mansouri, A., et al. (2020). Considerably improving clustering algorithms using UMAP dimensionality reduction technique: A comparative study. In A. El Moataz, D. Mammass, A. Mansouri, & F. Nouboud (Eds.), Image and signal processing ICISP 2020. Lecture Notes in computer science. Cham: Springer. https://doi.org/10.1007/978-3-030-51935-3_34
Ben-Ali, S. (2018). Modeling of a double effect evaporator: Bond graph approach. Chemical Engineering Research and Design, 138, 554–567. https://doi.org/10.1016/j.cherd.2018.07.007
Bhat, S. A., & Saraf, D. N. (2004). Steady-state identification, gross error detection, and data reconciliation for industrial process units. Industrial and Engineering Chemistry Research, 43(15), 4323–4336. https://doi.org/10.1021/ie030563u
Campello, R. J. G. B., Moulavi, D., & Sander, J. (2013). Density-based clustering based on hierarchical density estimates. In J. Pei, V. S. Tseng, L. Cao, H. Motoda, & G. Xu (Eds.), Advances in knowledge discovery and data mining PAKDD 2013, lecture notes in computer science. Berlin: Springer. https://doi.org/10.1007/978-3-642-37456-2_14
Cao, L. J., Chua, K. S., Chong, W. K., Lee, H. P., & Gu, Q. M. (2003). A comparison of PCA, KPCA and ICA for dimensionality reduction in support vector machine. Neurocomputing, 55(1–2), 321–336. https://doi.org/10.1016/S0925-2312(03)00433-8
Casola, G., Sugiyama, H., Siegmund, C., & Mattern, M. (2018). Uncertainty-conscious methodology for process performance assessment in biopharmaceutical drug product manufacturing. AIChE Journal, 64(4), 1272–1284. https://doi.org/10.1002/aic.16020
Charaniya, S., Le, H., Rangwala, H., Mills, K., Johnson, K., Karypis, G., et al. (2010). Mining manufacturing data for discovery of high productivity process characteristics. Journal of Biotechnology, 147(3–4), 186–197. https://doi.org/10.1016/j.jbiotec.2010.04.005
Cunningham, P. (2008). Dimension reduction. In M. Cord & P. Cunningham (Eds.), Machine learning techniques for multimedia. Cognitive technologies. Berlin: Springer. https://doi.org/10.1007/978-3-540-75171-7_4
Enioutina, E. Y., Salis, E. R., Job, K. M., Gubarev, M. I., Krepkova, L. V., & Sherwin, C. M. (2017). Herbal Medicines: Challenges in the modern world. Part 5. Status and current directions of complementary and alternative herbal medicine worldwide. Expert Review of Clinical Pharmacology, 10(3), 327–338. https://doi.org/10.1080/17512433.2017.1268917
Fransson, M., & Folestad, S. (2006). Real-time alignment of batch process data using COW for on-line process monitoring. Chemometrics and Intelligent Laboratory Systems, 84(1–2), 56–61. https://doi.org/10.1016/j.chemolab.2006.04.020
Fujiwara, T., Kwon, O., & Ma, K. (2020). Supporting analysis of dimensionality reduction results with contrastive learning. IEEE Transactions on Visualization and Computer Graphics, 26(1), 45–55. https://doi.org/10.1109/TVCG.2019.2934251
García, S., Ramírez-Gallego, S., Luengo, J., Benítez, J. M., & Herrera, F. (2016). Big data preprocessing: Methods and prospects. Big Data Analytics, 1, 9. https://doi.org/10.1186/s41044-016-0014-0
González-Martínez, J. M., Ferrer, A., & Westerhuis, J. A. (2011). Real-time synchronization of batch trajectories for on-line multivariate statistical process control using Dynamic Time Warping. Chemometrics and Intelligent Laboratory Systems, 105(2), 195–206. https://doi.org/10.1016/j.chemolab.2011.01.003
Hodge, V., & Austin, J. (2004). A survey of outlier detection methodologies. Artificial Intelligence Review, 22(2), 85–126. https://doi.org/10.1007/s10462-004-4304-y
Hoffmann, H. (2007). Kernel PCA for novelty detection. Pattern Recognition, 40(3), 863–874. https://doi.org/10.1016/j.patcog.2006.07.009
Hozumi, Y., Wang, R., Yin, C., & Wei, G. (2021). UMAP-assisted K-means clustering of large-scale SARS-CoV-2 mutation datasets. Computers in Biology and Medicine, 131, 104264. https://doi.org/10.1016/j.compbiomed.2021.104264
Jin, Y., Qin, S. J., Huang, Q., Saucedo, V., Li, Z., Meier, A., et al. (2019). Classification and diagnosis of bioprocess cell growth productions using early-stage data. Industrial Amd Engineering Chemistry Research, 58(30), 13469–13480. https://doi.org/10.1021/acs.iecr.9b01175
McInnes, L., Healy, J., & Melville, J. (2018). UMAP: Uniform manifold approximation and projection for dimension reduction. arXiv:1802.03426.
Melcher, M., Scharl, T., Luchner, M., Striedner, G., & Leisch, F. (2017). Boosted structured additive regression for Escherichia coli fed-batch fermentation modeling. Biotechnology and Bioengineering, 114(2), 321–334. https://doi.org/10.1002/bit.26073
Meneghetti, N., Facco, P., Bezzo, F., Himawan, C., Zomer, S., & Barolo, M. (2016). Knowledge management in secondary pharmaceutical manufacturing by mining of data historians: A proof-of-concept study. International Journal of Pharmaceutics, 505(1–2), 394–408. https://doi.org/10.1016/j.ijpharm.2016.03.035
Pałkowski, Ł, Karolak, M., Kubiak, B., Błaszczyński, J., Słowiński, R., Thommes, M., et al. (2018). Optimization of pellets manufacturing process using rough set theory. European Journal of Pharmaceutical Sciences, 124, 295–303. https://doi.org/10.1016/j.ejps.2018.08.027
Rathore, A. S., Pathak, M., Jain, R., & Jadaun, G. P. S. (2016). Monitoring quality of biotherapeutic products using multivariate data analysis. The AAPS Journal, 18(4), 793–800. https://doi.org/10.1208/s12248-016-9908-z
Schubert, E., Sander, J., Ester, M., Kriegel, H. P., & Xu, X. (2017). DBSCAN revisited, revisited: Why and how you should (still) use DBSCAN. ACM Transactions on Database Systems, 42(3), 19. https://doi.org/10.1145/3068335
Sogut, Z., Ilten, N., & Oktay, Z. (2010). Energetic and exergetic performance evaluation of the quadruple-effect evaporator unit in tomato paste production. Energy, 35(9), 3821–3826. https://doi.org/10.1016/j.energy.2010.05.035
Steinwandter, V., Borchert, D., & Herwig, C. (2019). Data science tools and applications on the way to Pharma 4.0. Drug Discovery Today, 24(9), 1795–1805. https://doi.org/10.1016/j.drudis.2019.06.005
Sun, Y., Qin, W., Zhuang, Z., & Xu, H. (2021). An adaptive fault detection and root-cause analysis scheme for complex industrial processes using moving window KPCA and information geometric causal inference. Journal of Intelligent Manufacturing. https://doi.org/10.1007/s10845-021-01752-9
Suthar, K., Shah, D., Wang, J., & He, Q. P. (2019). Next-generation virtual metrology for semiconductor manufacturing: A feature-based framework. Computers and Chemical Engineering, 127, 140–149. https://doi.org/10.1016/j.compchemeng.2019.05.016
Tao, F., Qi, Q., Liu, A., & Kusiak, A. (2018). Data-driven smart manufacturing. Journal of Manufacturing Systems, 48, 157–169. https://doi.org/10.1016/j.jmsy.2018.01.006
Tulsyan, A., Garvin, C., & Undey, C. (2019). Industrial batch process monitoring with limited data. Journal of Process Control, 77, 114–133. https://doi.org/10.1016/j.jprocont.2019.03.002
Ündey, C., Williams, B. A., & Çınar, A. (2002). Monitoring of batch pharmaceutical fermentations: Data synchronization, landmark alignment, and real-time monitoring. IFAC Proceedings Volumes, 35(1), 271–276. https://doi.org/10.3182/20020721-6-ES-1901.01354
Vidovič, S., Horvat, M., Bizjak, A., Planinšek, O., Petek, B., Burjak, M., et al. (2019). Elucidating molecular properties of kappa-carrageenan as critical material attributes contributing to drug dissolution from pellets with a multivariate approach. International Journal of Pharmaceutics, 566, 662–673. https://doi.org/10.1016/j.ijpharm.2019.06.016
Wang, J., & He, Q. P. (2010). Multivariate statistical process monitoring based on statistics pattern analysis. Industrial and Engineering Chemistry Research, 49(17), 7858–7869. https://doi.org/10.1021/ie901911p
Xu, B., Shi, X., Luo, G., Lin, Z., Sun, F., Dai, S., et al. (2020). Key technologies and applications of industrial big data in manufacturing of Chinese medicine. China Journal of Chinese Materia Medica, 45(2), 221–232.
Zhou, Y., Chuah, K. B., & Chen, S. (2005). An information system model in Chinese herbal medicine manufacturing enterprises. Journal of Manufacturing Technology Management, 16(2), 145–155. https://doi.org/10.1108/17410380510576804
Zhu, J., Ge, Z., Song, Z., & Gao, F. (2018). Review and big data perspectives on robust data mining approaches for industrial process modeling with outliers and missing data. Annual Reviews in Control, 46, 107–133. https://doi.org/10.1016/j.arcontrol.2018.09.003
Zomer, S., Zhang, J., Talwar, S., Chattoraj, S., & Hewitt, C. (2018). Multivariate monitoring for the industrialisation of a continuous wet granulation tableting process. International Journal of Pharmaceutics, 547(1–2), 506–519. https://doi.org/10.1016/j.ijpharm.2018.06.034
Funding
This work was supported by the National Science and Technology Major Project of China. Grant Number: 2018ZX09201011‐002.
Author information
Authors and Affiliations
Contributions
Sheng Zhang: Conceptualization, Software, Methodology, Formal analysis, Data curation, Investigation, Writing—Original Draft, Visualization. Xinyuan Xie: Methodology, Data curation, Investigation, Validation. Haibin Qu: Conceptualization, Writing—Review & Editing, Supervision, Project administration, Funding acquisition.
Corresponding author
Ethics declarations
Conflicts of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Zhang, S., Xie, X. & Qu, H. A data-driven workflow for evaporation performance degradation analysis: a full-scale case study in the herbal medicine manufacturing industry. J Intell Manuf 34, 651–668 (2023). https://doi.org/10.1007/s10845-021-01816-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10845-021-01816-w