Skip to content
Licensed Unlicensed Requires Authentication Published by De Gruyter September 15, 2015

Data cleaning in the process industries

  • Shu Xu

    Shu Xu is a PhD candidate in the Department of Chemical Engineering at the University of Texas at Austin. His research focuses on data cleaning and data analytics in process industries. He received his Bachelor’s degree from the Department of Chemical Engineering at Tianjin University, China.

    , Bo Lu

    Bo Lu is a PhD candidate in the Department of Chemical Engineering at the University of Texas at Austin. His research focuses on data-driven modeling and process monitoring of batch processes in chemical manufacturing industries, specifically the development of PLS based inferential sensors for batch processes. He received his bachelor’s degree from the Department of Chemical Engineering at the University of Alberta in Edmonton, Canada.

    , Michael Baldea

    Michael Baldea is Assistant Professor in the McKetta Department of Chemical Engineering at the University of Texas at Austin. He obtained his Diploma (2000) and MSc (2001) from “Babes-Bolyai” University in Cluj-Napoca, Romania, and his PhD (2006) from the University of Minnesota, all in chemical engineering. His research concentrates on the modeling, analysis, optimization and control of process and energy systems, areas in which he has published over 60 refereed papers. He is the recipient of several research and service awards, including the NSF CAREEER Award, the Moncrief Grand Challenges Prize, the Model-Based Innovation Prize from Process Systems Enterprise, and the Best Referee Award from the Journal of Process Control.

    , Thomas F. Edgar

    Thomas F. Edgar is the Abell Chair in Engineering Professor of Chemical Engineering at the University of Texas at Austin and Director of the UT Energy Institute. Dr. Edgar received his BS degree in chemical engineering from the University of Kansas and a PhD from Princeton University. For the past 40 years, he has concentrated his academic work in process modeling, control, and optimization, with over 400 articles and book chapters. Dr. Edgar has received major awards from AIChE, ASEE, and AACC and is a member of the National Academy of Engineering.

    EMAIL logo
    , Willy Wojsznis

    Willy Wojsznis has earned his Engineering Degree in Electrical Engineering in 1964 and his PhD from Technical University of Warsaw in 1973. Since 1991 Willy is with Emerson Process Management developing advanced control products. Recently he has been involved in Big Data research. His research and development resulted in 38 US patents and over 50 technical conference and journal papers. He coauthored ISA bestseller books Advanced control unleashed, Advanced control foundation, Wireless control foundation, and a chapter of ISA/CRC instrumentation handbook. Willy is inducted into a Control Magazine’s Process Automation Hall of Fame and is ISA Fellow and IEEE senior member.

    , Terrence Blevins

    Terrence Blevins received a Master of Science in Electrical Engineering from Purdue University in 1973. He lead the development of DeltaV advanced control products and coauthored the book Wireless Control Foundation and ISA bestselling books Advanced Control Foundation and Control Loop Foundation. Terry is a member of Control Magazine’s Process Automation Hall of Fame and an ISA Fellow. Presently, he is a principal technologist in the applied research team at Emerson Process Management.

    and Mark Nixon

    Mark Nixon was lead architect for DeltaV from its inception through 2005. In 2006 he took a very active role in the design and standardization of WirelessHART. He currently leads the applied research group where he is pursuing his interests in control, big data analytics, wireless, operator interfaces, and advanced graphics. He holds over 90 patents and has coauthored four books on wireless and control. He is an ISA Fellow and a member of the Automation Hall of Fame. He received his bachelors from the University of Waterloo in Canada.

Abstract

In the past decades, process engineers are facing increasingly more data analytics challenges and having difficulties obtaining valuable information from a wealth of process variable data trends. The raw data of different formats stored in databases are not useful until they are cleaned and transformed. Generally, data cleaning consists of four steps: missing data imputation, outlier detection, noise removal, and time alignment and delay estimation. This paper discusses available data cleaning methods that can be used in data pre-processing and help overcome challenges of “Big Data”.


Corresponding author: Thomas F. Edgar, McKetta Department of Chemical Engineering, The University of Texas at Austin, Austin, TX 78712, USA, e-mail:

About the authors

Shu Xu

Shu Xu is a PhD candidate in the Department of Chemical Engineering at the University of Texas at Austin. His research focuses on data cleaning and data analytics in process industries. He received his Bachelor’s degree from the Department of Chemical Engineering at Tianjin University, China.

Bo Lu

Bo Lu is a PhD candidate in the Department of Chemical Engineering at the University of Texas at Austin. His research focuses on data-driven modeling and process monitoring of batch processes in chemical manufacturing industries, specifically the development of PLS based inferential sensors for batch processes. He received his bachelor’s degree from the Department of Chemical Engineering at the University of Alberta in Edmonton, Canada.

Michael Baldea

Michael Baldea is Assistant Professor in the McKetta Department of Chemical Engineering at the University of Texas at Austin. He obtained his Diploma (2000) and MSc (2001) from “Babes-Bolyai” University in Cluj-Napoca, Romania, and his PhD (2006) from the University of Minnesota, all in chemical engineering. His research concentrates on the modeling, analysis, optimization and control of process and energy systems, areas in which he has published over 60 refereed papers. He is the recipient of several research and service awards, including the NSF CAREEER Award, the Moncrief Grand Challenges Prize, the Model-Based Innovation Prize from Process Systems Enterprise, and the Best Referee Award from the Journal of Process Control.

Thomas F. Edgar

Thomas F. Edgar is the Abell Chair in Engineering Professor of Chemical Engineering at the University of Texas at Austin and Director of the UT Energy Institute. Dr. Edgar received his BS degree in chemical engineering from the University of Kansas and a PhD from Princeton University. For the past 40 years, he has concentrated his academic work in process modeling, control, and optimization, with over 400 articles and book chapters. Dr. Edgar has received major awards from AIChE, ASEE, and AACC and is a member of the National Academy of Engineering.

Willy Wojsznis

Willy Wojsznis has earned his Engineering Degree in Electrical Engineering in 1964 and his PhD from Technical University of Warsaw in 1973. Since 1991 Willy is with Emerson Process Management developing advanced control products. Recently he has been involved in Big Data research. His research and development resulted in 38 US patents and over 50 technical conference and journal papers. He coauthored ISA bestseller books Advanced control unleashed, Advanced control foundation, Wireless control foundation, and a chapter of ISA/CRC instrumentation handbook. Willy is inducted into a Control Magazine’s Process Automation Hall of Fame and is ISA Fellow and IEEE senior member.

Terrence Blevins

Terrence Blevins received a Master of Science in Electrical Engineering from Purdue University in 1973. He lead the development of DeltaV advanced control products and coauthored the book Wireless Control Foundation and ISA bestselling books Advanced Control Foundation and Control Loop Foundation. Terry is a member of Control Magazine’s Process Automation Hall of Fame and an ISA Fellow. Presently, he is a principal technologist in the applied research team at Emerson Process Management.

Mark Nixon

Mark Nixon was lead architect for DeltaV from its inception through 2005. In 2006 he took a very active role in the design and standardization of WirelessHART. He currently leads the applied research group where he is pursuing his interests in control, big data analytics, wireless, operator interfaces, and advanced graphics. He holds over 90 patents and has coauthored four books on wireless and control. He is an ISA Fellow and a member of the Automation Hall of Fame. He received his bachelors from the University of Waterloo in Canada.

Acknowledgments

The authors gratefully acknowledge financial and technical support from Emerson Process Management.

Disclaimer: Material presented in this publication reflects opinions of the authors and not their institutional affiliations.

References

Abraham B, Box GEP. Bayesian analysis of some outlier problems in time series. Biometrika 1979; 66: 229–236.10.1093/biomet/66.2.229Search in Google Scholar

Abraham B, Chuang A. Outlier detection and time series modeling. Technometrics 1989; 31: 241–248.10.1080/00401706.1989.10488517Search in Google Scholar

Abuelzeet ZH, Becerra VM, Roberts PD. Combined bias and outlier identification in dynamic data reconciliation. Comput Chem Eng 2002; 26: 921–935.10.1016/S0098-1354(02)00018-2Search in Google Scholar

Aguiar-Conraria L, Soares MJ. The continuous wavelet transform: a primer. Technical report, Portugal: Economics Department, University of Minho, 2011.Search in Google Scholar

Ahmed S. Parameter and delay estimation of continuous-time models from uniformly and non-uniformly sampled data (PhD thesis). Alberta, Canada: University of Alberta, 2006.Search in Google Scholar

Allan J, Carbonell J, Doddington G, Yamron J, Yang Y. Topic detection and tracking pilot study final report. In: Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop, Carnegie Mellon University, 1998; 194–218.Search in Google Scholar

Allison PD. Handling missing data by maximum likelihood. In: SAS Global Forum Satistics and Data Analysis. Orlando, Florida: SAS institute, 2012; 1–21.Search in Google Scholar

Almeida J, Barbosa L, Pais A, Formosinho S. Improving hierarchical cluster analysis: a new method with outlier detection and automatic clustering. Chemometr Intell Lab Syst 2007; 87: 208–217.10.1016/j.chemolab.2007.01.005Search in Google Scholar

AlMutawa J. Identification of errors-in-variables state space models with observation outliers based on minimum covariance determinant. J Process Control 2009; 19: 879–887.10.1016/j.jprocont.2008.11.011Search in Google Scholar

Altman NS. An introduction to kernel and nearest-neighbor nonparametric regression. Am Stat 1992; 46: 175–185.Search in Google Scholar

Aminghafari M, Cheze N, Poggi JM. Multivariate denoising using wavelets and principal component analysis. Comput Stat Data Anal 2006; 50: 2381–2398.10.1016/j.csda.2004.12.010Search in Google Scholar

Anderson R. Modern methods for robust regression. Quantitative applications in the social sciences. New York: SAGE Publications Inc., 2008.Search in Google Scholar

Anderson BD, Moore JB. Optimal filtering. New Jersey: Prentice Hall, 1979.Search in Google Scholar

Arteaga F, Ferrer A. Dealing with missing data in MSPC: several methods, different interpretations, some examples. J Chemom 2002; 16: 408–418.10.1002/cem.750Search in Google Scholar

Arteaga F, Ferrer A. Framework for regression-based missing data imputation methods in on-line MSPC. J Chemom 2005; 19: 439–447.10.1002/cem.946Search in Google Scholar

Bakshi BR. Multiscale PCA with application to multivariate statistical process monitoring. AIChE J 1998; 44: 1596–1610.10.1002/aic.690440712Search in Google Scholar

Baraldi AN, Enders CK. An introduction to modern missing data analyses. J Sch Psychol 2010; 48: 5–37.10.1016/j.jsp.2009.10.001Search in Google Scholar PubMed

Baraldi P, Maio FD, Genini D, Zio E. Reconstruction of missing data in multidimensional time series by fuzzy similarity. Appl Soft Comput 2014; 26: 1–9.10.1016/j.asoc.2014.09.038Search in Google Scholar

Barnett V, Lewis T. Outliers in statistical data. Wiley series in probability and mathematical satistics, 2nd ed, Chichester: Wiley, 1984.Search in Google Scholar

Bavdekar VA, Deshpande AP, Patwardhan SC. Identification of process and measurement noise covariance for state and parameter estimation using extended Kalman filter. J Process Control 2011; 21: 585–601.10.1016/j.jprocont.2011.01.001Search in Google Scholar

Becker C. The size of the largest nonidentifiable outlier as a performance criterion for multivariate outlier identification: the case of high-dimensional data. In: Bethlehem JG, van der Heijden PG, editors. COMPSTAT. Heidelberg: Physica-Verlag, 2000: 211–216.Search in Google Scholar

Benesty J, Huang Y, Chen J. Time delay estimation via minimum entropy. IEEE Signal Process Lett 2007; 14: 157–160.10.1109/LSP.2006.884038Search in Google Scholar

Bianco AM, Garca Ben M, Martnez EJ, Yohai VJ. Outlier detection in regression models with ARIMA errors using robust estimates. J Forecasting 2001; 20: 565–579.10.1002/for.768Search in Google Scholar

Bishop CM. Novelty detection and neural network validation. Vision, Image Signal Proc 1994; 141: 217–222.10.1049/ip-vis:19941330Search in Google Scholar

Bishop CM. Pattern recognition and machine learning. Information science and statistics. New York: Springer-Verlag, 2006.Search in Google Scholar

Bishop CM, Svensén M, Williams CK. GTM: the generative topographic mapping. Neural Comput 1998; 10: 215–234.10.1162/089976698300017953Search in Google Scholar

Björklund S. A survey and comparison of time-delay estimation methods in linear systems. Technical report, Linkopings University, 2003.Search in Google Scholar

Blevins T, Nixon M, Zielinski M. Using wireless measurements in control applications. Technical report, Emerson Process Management 2013. URL http://www2.emersonprocess.com/siteadmincenter/PM%20Articles/ISA_Nov13_WirelessHart.pdf.Search in Google Scholar

Bode C, Ko B, Edgar T. Run-to-run control and performance monitoring of overlay in semiconductor manufacturing. Control Eng Pract 2004; 12: 893–900.10.1016/S0967-0661(03)00154-0Search in Google Scholar

Bogomolov A. Multivariate process trajectories: capture, resolution and analysis. Chemometr Intell Lab Syst 2011; 108: 49–63.10.1016/j.chemolab.2011.02.005Search in Google Scholar

Bolton RJ, Hand DJ. Unsupervised profiling methods for fraud detection. In: Proc. Credit Scoring and Credit Control VII. Edinburgh, Scotland: Credit Research Centre, University of Edinburgh, 2001; 5–7.Search in Google Scholar

Boukouvala F, Muzzio FJ, Ierapetritou MG. Predictive modeling of pharmaceutical processes with missing and noisy data. AIChE J 2010; 56: 2860–2872.10.1002/aic.12203Search in Google Scholar

Box GEP, Jenkins GM, Reinsel GC. Time series analysis: forcasting and control, 4th ed., New York:Wiley, 2013.Search in Google Scholar

Bradley PS, Fayyad UM, Mangasarian OL. Mathematical programming for data mining: formulations and challenges. INFORMS J Comput 1999; 11: 217–238.10.1287/ijoc.11.3.217Search in Google Scholar

Breiman L. Random forests. Mach Learn 2001; 45: 5–32.10.1023/A:1010933404324Search in Google Scholar

Breunig MM, Kriegel HP, Ng RT, Sander J. LOF: identifying density-based local outliers. SIGMOD Rec 2000; 29: 93–104.10.1145/335191.335388Search in Google Scholar

Brown RG, Hwang PY. Introduction to random signals and applied kalman filtering, 4th ed., New Jersey: John Wiley & Sons, Ltd., 2012.Search in Google Scholar

Byers S, Raftery AE. Nearest-neighbor clutter removal for estimating features in spatial point processes. J Am Stat Assoc 1998; 93: 577–584.10.1080/01621459.1998.10473711Search in Google Scholar

Cai Q, He H, Man H. Spatial outlier detection based on iterative self-organizing learning model. Neurocomputing 2013; 117: 161–172.10.1016/j.neucom.2013.02.007Search in Google Scholar

Camacho J. Missing-data theory in the context of exploratory data analysis. Chemometr Intell Lab Syst 2010; 103: 8–18.10.1016/j.chemolab.2010.04.017Search in Google Scholar

Camacho J. Visualizing big data with compressed score plots: approach and research challenges. Chemometr Intell Lab Syst 2014; 135: 110–125.10.1016/j.chemolab.2014.04.011Search in Google Scholar

Candès EJ, Li X, Ma Y, Wright J. Robust principal component analysis? J ACM 2011; 58: 11: 1–11: 37.10.1145/1970392.1970395Search in Google Scholar

Chaloner K, Byant R. A Bayesian approach to outlier detection and residual analysis. Biometrika 1988; 75: 651–659.10.1093/biomet/75.4.651Search in Google Scholar

Chang I, Tiao GC, Chen C. Estimation of time series parameters in the presence of outliers. Technometrics 1988; 30: 193–204.10.1080/00401706.1988.10488367Search in Google Scholar

Cheeseman P, Self M, Kelly J, Taylor W, Freeman D, Stutz J. Bayesian classification. In: Proceedings of American Association of Artificial Intelligence (AAAI). San Mateo: Morgan kaufmann, 1988: 607–611.Search in Google Scholar

Chen Z. Bayesian filtering: from Kalman filters to particle filters, and beyond. Statistics 2003; 182: 1–69.10.1080/02331880309257Search in Google Scholar

Chen WS. Bayesian estimation by sequential Monte Carlo sampling for nonlinear dynamic systems. PhD thesis, The Ohio State University 2004.10.1016/S1474-6670(17)31834-7Search in Google Scholar

Chen C, Liu LM. Joint estimation of model parameters and outlier effects in time series. J Am Stat Assoc 1993; 88: 284–297.Search in Google Scholar

Chen J, Romagnoli J. A strategy for simultaneous dynamic data reconciliation and outlier detection. Comput Chem Eng 1998; 22: 559–562.10.1016/S0098-1354(97)00233-0Search in Google Scholar

Chen J, Bandoni A, Romagnoli J. Robust statistical process monitoring. Comput Chem Eng 1996; 20, Suppl 1: 497–502.10.1016/0098-1354(96)00092-0Search in Google Scholar

Chen J, Bandoni A, Romagnoli J. Outlier detection in process plant data. Comput Chem Eng 1998; 22: 641–646.10.1016/S0098-1354(97)00224-XSearch in Google Scholar

Chen T, Morris J, Martin E. Dynamic data rectification using particle filters. Comput Chem Eng 2008; 32: 451–462.10.1016/j.compchemeng.2007.03.012Search in Google Scholar

Chiang LH, Russell EL, Braatz RD. Fault detection and diagnosis in industrial systems. London: Springer-Verlag, 2001.10.1007/978-1-4471-0347-9Search in Google Scholar

Chiang LH, Pell RJ, Seasholtz MB. Exploring process data with the use of robust outlier detection algorithms. J Process Control 2003; 13: 437–449.10.1016/S0959-1524(02)00068-9Search in Google Scholar

Cho JH, Lee JM, Choi SW, Lee D, Lee IB. Fault identification for process monitoring using kernel principal component analysis. Chem Eng Sci 2005; 60: 279–288.10.1016/j.ces.2004.08.007Search in Google Scholar

Choi SW, Lee C, Lee JM, Park JH, Lee IB. Fault detection and identification of nonlinear processes based on kernel PCA. Chemometr Intell Lab Syst 2005; 75: 55–67.10.1016/j.chemolab.2004.05.001Search in Google Scholar

Chong IG, Jun CH. Performance of some variable selection methods when multicollinearity is present. Chemometr Intell Lab Syst 2005; 78: 103–112.10.1016/j.chemolab.2004.12.011Search in Google Scholar

Christoffersson A. The one component model with incomplete data. PhD thesis, Uppsala University 1970.Search in Google Scholar

Comon P. Independent component analysis, a new concept? Signal processing 1994; 36: 287–314.10.1016/0165-1684(94)90029-9Search in Google Scholar

Cortes C, Vapnik V. Support vector networks. Mach Learn 1995; 20: 273–297.10.1007/BF00994018Search in Google Scholar

Croux C, Rousseeuw PJ, Hössjer O. Generalized S-estimators. J Am Stat Assoc 1994; 89: 1271–1281.10.1080/01621459.1994.10476867Search in Google Scholar

Cucina D, di Salvatore A, Protopapas MK. Outliers detection in multivariate time series using genetic algorithms. Chemometr Intell Lab Syst 2014; 132: 103–110.10.1016/j.chemolab.2014.01.007Search in Google Scholar

Cui W, Yan X. Adaptive weighted least square support vector machine regression integrated with outlier detection and its application in QSAR. Chemometr Intell Lab Syst 2009; 98: 130–135.10.1016/j.chemolab.2009.05.008Search in Google Scholar

Daszykowski M, Kaczmarek K, Heyden YV, Walczak B. Robust statistics in data analysis – a review: basic concepts. Chemometr Intell Lab Syst 2007; 85: 203–219.10.1016/j.chemolab.2006.06.016Search in Google Scholar

Davies L, Gather U. The identification of multiple outliers. J Am Stat Assoc 1993; 88: 782–792.10.1080/01621459.1993.10476339Search in Google Scholar

Davis J, Edgar TF, Porter J, Bernaden J, Sarli MS. Smart manufacturing, manufacturing intelligence and demand-dynamic performance. Comput Chem Eng 2012; 47: 145–156.10.1016/j.compchemeng.2012.06.037Search in Google Scholar

de la Fuente RLN, Garc a Muñoz S, Biegler LT. An efficient nonlinear programming strategy for PCA models with incomplete data sets. J Chemom 2010; 24: 301–311.Search in Google Scholar

de Ligny CL, Nieuwdorp GHE, Brederode WK, Hammers WE, van Houwelingen JC. An application of factor analysis with missing data. Technometrics 1981; 23: 91–95.10.1080/00401706.1981.10486242Search in Google Scholar

de Noord OE, Theobald EH. Multilevel component analysis and multilevel PLS of chemical process data. J Chemom 2005; 19: 301–307.10.1002/cem.933Search in Google Scholar

Dempster AP, Laird N, Rubin D. Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Series B Stat Methodol 1977; 39: 1–38.Search in Google Scholar

Deng J, Huang B. Identification of nonlinear parameter varying systems with missing output data. AIChE J 2012; 58: 3454–3467.10.1002/aic.13735Search in Google Scholar

Di Nuovo AG. Missing data analysis with fuzzy C-Means: a study of its application in a psychological scenario. Expert Syst Appl 2011; 38: 6793–6797.10.1016/j.eswa.2010.12.067Search in Google Scholar

Dielman TE. Least absolute value regression: recent contributions. J Stat Comput Simul 2005; 75: 263–286.10.1080/0094965042000223680Search in Google Scholar

Doymaz F, Bakhtazad A, Romagnoli JA, Palazoglu A. Wavelet-based robust filtering of process data. Comput Chem Eng 2001; 25: 1549–1559.10.1016/S0098-1354(01)00718-9Search in Google Scholar

Eirola E, Doquire G, Verleysen M, Lendasse A. Distance estimation in numerical data sets with missing values. Inform Sci 2013; 240: 115–128.10.1016/j.ins.2013.03.043Search in Google Scholar

Eirola E, Lendasse A, Vandewalle V, Biernacki C. Mixture of Gaussians for distance estimation with missing data. Neurocomputing 2014; 131: 32–42.10.1016/j.neucom.2013.07.050Search in Google Scholar

Eriksson A, Van Den Hengel A. Efficient computation of robust low-rank matrix approximations in the presence of missing data using the L 1 norm. In: Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on. San Francisco, CA, USA: IEEE, 2010; 771–778.Search in Google Scholar

Esbensen KH, Halstensen M, Lied TT, Saudland A, Svalestuen J, de Silva S, Hope B. Acoustic chemometrics – from noise to information. Chemometr Intell Lab Syst 1998; 44: 61–76.10.1016/S0169-7439(98)00114-2Search in Google Scholar

Escobar HJG. Advanced monitoring and soft sensor development with application to industrial processes. PhD thesis, Auburn University, 2012.Search in Google Scholar

Ester M, peter Kriegel H, S J, Xu X. A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining. Portland, Oregon: AAAI Press, 1996; 226–231.Search in Google Scholar

Faloutsos C, Korn F, Labrinidis A, Kotidis Y, Kaplunovich A, Perkovic D. Quantifiable data mining using principal component analysis. Technical Report Technical Report 97-25, College Park, MD: Institute for Systems Research, University of Maryland, 1997.Search in Google Scholar

Fernández-Pierna JA, Wahl F, de Noord OE, Massart DL. Methods for outlier detection in prediction. Chemometr Intell Lab Syst 2002; 63: 27–39.10.1016/S0169-7439(02)00034-5Search in Google Scholar

Fernández-Pierna JA, Jin L, Daszykowski M, Wahl F, Massart DL. A methodology to detect outliers/inliers in prediction with PLS. Chemometr Intell Lab Syst 2003; 68: 17–28.10.1016/S0169-7439(03)00084-4Search in Google Scholar

Filzmoser P, Dehon C, Croux C. Outlier resistant estimators for canonical correlation analysis. In: Bethlehem J, van der Heijden P, editors. COMPSTAT. Heidelberg: Physica-Verlag, 2000: 301–306.Search in Google Scholar

Fischer B, Medvedev A. L2 time delay estimation by means of Laguerre functions. In: Procedings of American Control Conference, San Diego, CA, USA, volume 1 1999; 455–459.Search in Google Scholar

Fox A. Outliers in Time Series. J R Stat Soc Series B Stat Methodol 1972; 34: 350–363.10.1111/j.2517-6161.1972.tb00912.xSearch in Google Scholar

Franses PH, Lucas A. Outlier detection in cointegration analysis. J Bus Econ Stat 1998; 16: 459–468.Search in Google Scholar

Frigge M, Hoaglin DC, Iglewicz B. Some implementations of the boxplot. Am Stat 1989; 43: 50–54.Search in Google Scholar

Fukunaga K, Hostetler L. The estimation of the gradient of a density function, with applications in pattern recognition. IEEE Trans Inf Theory 1975; 21: 32–40.10.1109/TIT.1975.1055330Search in Google Scholar

Gabriel KR, Zamir S. Lower rank approximation of matrices by least squares with any choice of weights. Technometrics 1979; 21: 489–498.10.1080/00401706.1979.10489819Search in Google Scholar

Galeano P, Peña D, Tsay RS. Outlier detection in multivariate time series by projection pursuit. J Am Stat Assoc 2006; 101: 654–669.10.1198/016214505000001131Search in Google Scholar

Galicia HJ, He QP, Wang J. Adaptive outlier detection and classification for online soft sensor update. In: Kariwala, Vinay, Samavedham, Lakshminarayanan, Braatz, Richard D, conference editors. International Symposium on Advanced Control of Chemical Processes ADCHEM, Furama Riverfront, Singapore, 2012a.10.3182/20120710-4-SG-2026.00091Search in Google Scholar

Galicia H, He Q, Wang J. A Bayesian supervisory approach of outlier detection for recursive soft sensor update. In: CPC VIII Conference, Savannah, Georgia, USA, volume 54 2012b.Search in Google Scholar

Galvão RKH, José GE, Filho HAD, Araujo MCU, da Silva EC, Paiva HM, Saldanha TCB, Ênio Sartre Oliveira Nunes de Souza. Optimal wavelet filter construction using X and Y data. Chemometr Intell Lab Syst 2004; 70: 1–10.10.1016/j.chemolab.2003.09.001Search in Google Scholar

Ge Z. Quality prediction and analysis for large-scale processes based on multi-level principal component modeling strategy. Control Eng Pract 2014; 31: 9–23.10.1016/j.conengprac.2014.06.006Search in Google Scholar

Ge Z, Yang C, Song Z. Improved kernel PCA-based monitoring approach for nonlinear processes. Chem Eng Sci 2009; 64: 2245–2255.10.1016/j.ces.2009.01.050Search in Google Scholar

Geladi P, Kowalski BR. Partial least-squares regression: a tutorial. Anal Chim Acta 1986; 185: 1–17.10.1016/0003-2670(86)80028-9Search in Google Scholar

Gómez-Carracedo M, Andrade J, López-Maha P, Muniategui S, Prada D. A practical comparison of single and multiple imputation methods to handle complex missing data in air quality datasets. Chemometr Intell Lab Syst 2014; 134: 23–33.10.1016/j.chemolab.2014.02.007Search in Google Scholar

Graham JW. Missing data analysis: making it work in the real world. Annu Rev Psychol 2009; 60: 549–576.10.1146/annurev.psych.58.110405.085530Search in Google Scholar PubMed

Grung B, Manne R. Missing values in principal component analysis. Chemometr Intell Lab Syst 1998; 42: 125–139.10.1016/S0169-7439(98)00031-8Search in Google Scholar

Gupta MR, Chen Y. Theory and use of the EM algorithm. Foundations and trends in signal processing. Norwell, MA: Now Publishers Inc, 2011.Search in Google Scholar

Hampel FR. A general qualitative definition of robustness. Ann Math Stat 1971; 42: 1887–1896.10.1214/aoms/1177693054Search in Google Scholar

Hampel FR. The influence curve and its role in robust estimation. J Am Stat Assoc 1974; 69: 383–393.10.1080/01621459.1974.10482962Search in Google Scholar

Han J, Kamber M, Pei J. Data mining: concepts and techniques. The Morgan Kaufmann series in data management systems, 3rd ed., San Francisco: Morgan kaufmann, 2006.Search in Google Scholar

Hansson A, Wallin R. Maximum likelihood estimation of Gaussian models with missing data–Eight equivalent formulations. Automatica 2012; 48: 1955–1962.10.1016/j.automatica.2012.05.060Search in Google Scholar

Hartigan JA, Wong MA. Algorithm AS 136: a K-means clustering algorithm. J R Stat Soc Ser C Appl Stat 1979; 28: 100–108.Search in Google Scholar

Hawkins S, He H, Williams G, Baxter R. Outlier detection using replicator neural networks. In: Kambayashi Y, Winiwarter W, Arikawa M, editors. Data warehousing and knowledge discovery. Berlin, Heidelberg: Springer, 2002: 170–180.Search in Google Scholar

Haykin S. Kalman filtering and neural networks. New York: Wiley, 2001.10.1002/0471221546Search in Google Scholar

Haykin S. Adaptive filter theory. Prentince Hall Information and system sciences series, 5th ed., NJ: Prentice Hall, 2013.Search in Google Scholar

Haykin S, Widrow B. Least-mean-square adaptive filters. New Jersey: John Wiley & Sons, Ltd., 2003.10.1002/0471461288Search in Google Scholar

He QP, Wang J. Statistics pattern analysis: a new process monitoring framework and its application to semiconductor batch processes. AIChE J 2011; 57: 107–121.10.1002/aic.12247Search in Google Scholar

Hodge VJ, Austin J. A survey of outlier detection methodologies. AI Rev 2004; 22: 85–126.Search in Google Scholar

Holland PW, Welsch RE. Robust regression using iteratively reweighted least-squares. Commun Stat Theory Methods 1977; 6: 813–827.10.1080/03610927708827533Search in Google Scholar

Huber PJ. Robust estimation of a location parameter. Ann Math Stat 1964; 35: 73–101.10.1214/aoms/1177703732Search in Google Scholar

Hyvärinen A, Oja E. Independent component analysis: algorithms and applications. Neural Netw 2000; 13: 411–430.10.1016/S0893-6080(00)00026-5Search in Google Scholar

Imtiaz SA, Shah SL. Treatment of missing values in process data analysis. Can J Chem Eng 2008; 86: 838–858.10.1002/cjce.20099Search in Google Scholar

Isaksson A, Horch A, Dumont G. Event-triggered deadtime estimation from closed-loop data. In: Proceeding of American Control Conference, Arlington, VA, USA, volume 4 2001; 3280–3285.10.1109/ACC.2001.946428Search in Google Scholar

Jaeckel LA. Estimating regression coefficients by minimizing the dispersion of the residuals. Ann Math Stat 1972; 43: 1449–1458.10.1214/aoms/1177692377Search in Google Scholar

Japkowicz N, Myers C, Gluck M. A novelty detection approach to classification. In: Proceedings of the Fourteenth Joint Conference on Artificial Intelligence, Montreal, Quebec, Canada, 1995; 518–523.Search in Google Scholar

Jesús Sánchez M, Peña D. The identification of multiple outliers in ARIMA models. Commun Stat Theory Methods 2003; 32: 1265–1287.10.1081/STA-120021331Search in Google Scholar

Jiang W, Zhang ZM, Yun Y, Zhan DJ, Zheng YB, Liang YZ, Yang ZY, Yu L. Comparisons of five algorithms for chromatogram alignment. Chromatographia 2013; 76: 1067–1078.10.1007/s10337-013-2513-8Search in Google Scholar

Jutten C, Herault J. Blind separation of sources, part I: an adaptive algorithm based on neuromimetic architecture. Signal Processing 1991; 24: 1–10.10.1016/0165-1684(91)90079-XSearch in Google Scholar

Kadlec P, Gabrys B, Strandt S. Data-driven soft sensors in the process industry. Comput Chem Eng 2009; 33: 795–814.10.1016/j.compchemeng.2008.12.012Search in Google Scholar

Kalman RE. A new approach to linear filtering and prediction problems. J Fluids Eng 1960; 82: 35–45.10.1115/1.3662552Search in Google Scholar

Karjala TW, Himmelblau DM. Dynamic rectification of data via recurrent neural nets and the extended Kalman filter. AIChE J 1996; 42: 2225–2239.10.1002/aic.690420812Search in Google Scholar

Kassidas A, Macgregor JF, Taylor PA. Synchronization of batch trajectories using dynamic time warping. AIChE J 1998; 44: 864–875.10.1002/aic.690440412Search in Google Scholar

Khatibisepehr S, Huang B. Dealing with irregular data in soft sensors: bayesian method and comparative study. Ind Eng Chem Res 2008; 47: 8713–8723.10.1021/ie800386vSearch in Google Scholar

Khatibisepehr S, Huang B. A Bayesian approach to robust process identification with ARX models. AIChE J 2013; 59: 845–859.10.1002/aic.13887Search in Google Scholar

Kim JO, Curry J. The treatment of missing data in multivariate analysis. Sociol Methods Res 1977; 6: 215–240.10.1177/004912417700600206Search in Google Scholar

Knapp C, Carter G. The generalized correlation method for estimation of time delay. IEEE Trans Acoust 1976; 24: 320–327.10.1109/TASSP.1976.1162830Search in Google Scholar

Knorr EM, Ng RT. Algorithms for mining distancebased outliers in large datasets. In: Ashish Gupta, Oded Shmueli, Jennifer Widom, editors. Proceedings of the international conference on very large data bases. New York City, USA: Morgan Kaufmann, 1998; 392–403.Search in Google Scholar

Kohonen T. Self-organizing maps. Springer series in information sciences, 3rd ed., Heidelberg: Physica-Verlag, 1999.Search in Google Scholar

Kourti T. Abnormal situation detection, three-way data and projection methods; robust data archiving and modeling for industrial applications. Annu Rev Control 2003; 27: 131–139.10.1016/j.arcontrol.2003.10.004Search in Google Scholar

Kourti T, MacGregor JF. Process analysis, monitoring and diagnosis, using multivariate projection methods. Chemometr Intell Lab Syst 1995; 28: 3–21.10.1016/0169-7439(95)80036-9Search in Google Scholar

Kriegel HP, Kröger P, Schubert E, Zimek A. LoOP: Local outlier probabilities. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management, CIKM ’09, Hong Kong, China. New York, NY, USA: ACM 2009; 1649–1652.10.1145/1645953.1646195Search in Google Scholar

Kriegel HP, Kröger P, Shubert E, Zimek A. Interpreting and unifying outlier scores. In: Proceedings of 11th SIAM International Conference on Data Mining 2011.10.1137/1.9781611972818.2Search in Google Scholar

Ku W, Storer RH, Georgakis C. Disturbance detection and isolation by dynamic principal component analysis. Chemometr Intell Lab Syst 1995; 30: 179–196. InCINC ’94.10.1016/0169-7439(95)00076-3Search in Google Scholar

Kwak DS, Kim KJ. A data mining approach considering missing values for the optimization of semiconductor-manufacturing processes. Expert Syst Appl 2012; 39: 2590–2596.10.1016/j.eswa.2011.08.114Search in Google Scholar

Lakshminarayan K, Harp S, Samad T. Imputation of missing data in industrial databases. Appl Intell 1999; 11: 259–275.10.1023/A:1008334909089Search in Google Scholar

Lazarevic A, Kumar V. Feature bagging for outlier detection. In: Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, KDD ’05. New York, NY, USA: ACM 2005; 157–166.10.1145/1081870.1081891Search in Google Scholar

Lee J, Kang B, Kang SH. Integrating independent component analysis and local outlier factor for plant-wide process monitoring. J Process Control 2011; 21: 1011–1021.10.1016/j.jprocont.2011.06.004Search in Google Scholar

Leibman M, Edgar T, Lasdon L. Efficient data reconciliation and estimation for dynamic processes using nonlinear programming techniques. Comput Chem Eng 1992; 16: 963–986.10.1016/0098-1354(92)80030-DSearch in Google Scholar

Li W, Bhargava A, Shah SL. Adaptive process monitoring via multichannel EIV lattice filters. AIChE J 2002; 48: 786–799.10.1002/aic.690480413Search in Google Scholar

Liebman M. Reconciliation of process measurements using statistical and nonlinear programming techniques. PhD thesis, University of Texas at Austin 1991.Search in Google Scholar

Little RJA. Missing-data adjustments in large surveys. J Bus Econ Stat 1988; 6: 287–296.Search in Google Scholar

Little RJA, Rubin RB. Statistical analysis with missing data, 2nd ed., New York:Wiley, 2002.10.1002/9781119013563Search in Google Scholar

Liu Y, Chen J. Correntropy Kernel learning for nonlinear system identification with outliers. Ind Eng Chem Res 2014; 53: 5248–5260.10.1021/ie401347kSearch in Google Scholar

Liu H, Shah S, Jiang W. On-line outlier detection and data cleaning. Comput Chem Eng 2004; 28: 1635–1647.10.1016/j.compchemeng.2004.01.009Search in Google Scholar

Ljung GM. On outlier detection in time series. J R Stat Soc Series B Stat Methodol 1993; 55: 559–567.10.1111/j.2517-6161.1993.tb01924.xSearch in Google Scholar

Lopes VV, Menezes JC. Inferential sensor design in the presence of missing data: a case study. Chemometr Intell Lab Syst 2005; 78: 1–10.10.1016/j.chemolab.2004.11.004Search in Google Scholar

Losada RA. Digitial FIlters with MATLAB. The MathWorks, Inc, 2008. URL http://www.mathworks.com/tagteam/55876_digfilt.pdf. Accessed on December 4, 2014.Search in Google Scholar

Lu B, Castillo I, Chiang L, Edgar TF. Industrial PLS model variable selection using moving window variable importance in projection. Chemometr Intell Lab Syst 2014; 135: 90–109.10.1016/j.chemolab.2014.03.020Search in Google Scholar

Lütkepohl H, Saikkonen P, Trenkler C. Testing for the cointegrating rank of a VAR process with level shift at unknown time. Econometrica 2004; 72: 647–662.10.1111/j.1468-0262.2004.00505.xSearch in Google Scholar

Lydon B. Internet of things industrial automation industry exploring and implementing IoT. InTech Magazine 2014; URL https://www.isa.org/standards-and-publications/isa-publications/intech-magazine/2014/mar-apr/cover-story-internet-of-things/. Accessed on October 7, 2014.Search in Google Scholar

Ma Y, Shi H, Ma H, Wang M. Dynamic process monitoring using adaptive local outlier factor. Chemometr Intell Lab Syst 2013; 127: 89–101.10.1016/j.chemolab.2013.06.004Search in Google Scholar

MacGregor J, Kourti T. Statistical process control of multivariate processes. Control Eng Pract 1995; 3: 403–414.10.1016/0967-0661(95)00014-LSearch in Google Scholar

Mallat S. A wavelet tour of signal processing, 3rd ed., The sparse way: Academic Press, 2008.Search in Google Scholar

Mallows CL. On some ttopic in robustness. Technical report, Murray Hill, New Jersey: Bell Telephone Laboratories Technical Memorandum, 1975.Search in Google Scholar

Manyika J, Chui M, Brown B, Bughin J, Dobbs R, Roxburgh C, Byers AH. Big data: the next frontier for innovation, competition, and productivity. The McKinsey Global Institute, McKinsey & Company, 2011. URL http://www.mckinsey.com/insights/business_technology/big_data_the_next_frontier_for_innovation. Accessed on October 7,2014.Search in Google Scholar

Marsland S. on-line novelty detection through self-organization, with application to Inspection robotics. PhD thesis, University of Manchester 2001.Search in Google Scholar

Martens H, Næs T. Multivariate calibration, 1st ed., New Jersey: John Wiley & Sons, Ltd., 1989.Search in Google Scholar

Martin R, Thomson D. Robust-resistant spectrum estimation. Proc IEEE 1982; 70: 1097–1115.10.1109/PROC.1982.12434Search in Google Scholar

Martin RD, Yohai VJ. Influence functionals for time series. Ann Stat 1986; 14: 781–818.Search in Google Scholar

McBrayer KF, Edgar TF. Bias detection and estimation in dynamic data reconciliation. J Process Control 1995; 5: 285–289.10.1016/0959-1524(95)00020-QSearch in Google Scholar

Mehmood T, Liland KH, Snipen L, Sæbø S. A review of variable selection methods in partial least squares regression. Chemometr Intell Lab Syst 2012; 118: 62–69.10.1016/j.chemolab.2012.07.010Search in Google Scholar

Miao Y, Su H, Wang W, Chu J. Simultaneous data reconciliation and joint bias and leak estimation based on support vector regression. Comput Chem Eng 2011; 35: 2141–2151.10.1016/j.compchemeng.2011.06.002Search in Google Scholar

Micić AD, Mataušek MR. Optimization of PID controller with higher-order noise filter. J Process Control 2014; 24: 694–700.10.1016/j.jprocont.2013.10.009Search in Google Scholar

Mitchell TM. Machine learning, 1st ed., New York: McGraw-Hill, 1997.Search in Google Scholar

Munoz JC, Chen J. Removal of the effects of outliers in batch process data through maximum correntropy estimator. Chemometr Intell Lab Syst 2012; 111: 53–58.10.1016/j.chemolab.2011.11.007Search in Google Scholar

Muñoz A, Muruzábal J. Self-organizing maps for outlier detection. Neurocomputing 1998; 18: 33–60.10.1016/S0925-2312(97)00068-4Search in Google Scholar

Muteki K, MacGregor JF, Ueda T. Estimation of missing data using latent variable methods with auxiliary information. Chemometr Intell Lab Syst 2005; 78: 41–50.10.1016/j.chemolab.2004.12.004Search in Google Scholar

Nairac A, Townsend N, Carr R, King S, Cowley P, Tarassenko L. A system for the analysis of jet engine vibration data. Integr Comput Aided Eng 1999; 6: 53–66.10.3233/ICA-1999-6106Search in Google Scholar

Narasimhan S, Jordache C. Data reconciliation and gross error detection. Burlington: TX: Gulf Professional Publishing, 1999.10.1016/B978-088415255-2/50002-1Search in Google Scholar

Natrella M. e-Handbook of statistical methods. NIST/SEMATECH 2010. URL http://www.itl.nist.gov/div898/handbook/. Accessed on September 7, 2014.Search in Google Scholar

Nelson PR. The treatment of missing measurements in PCA and PLS models. PhD thesis, McMaster University 2002.Search in Google Scholar

Nelson PR, Taylor PA, MacGregor JF. Missing data methods in PCA and PLS: score calculations with incomplete observations. Chemometr Intell Lab Syst 1996; 35: 45–65.10.1016/S0169-7439(96)00007-XSearch in Google Scholar

Ni B, Xiao D, Shah SL. Time delay estimation for MIMO dynamical systems C with time-frequency domain analysis. J Process Control 2010; 20: 83–94.10.1016/j.jprocont.2009.10.002Search in Google Scholar

Nielsen NPV, Carstensen JM, Smedsgaard Jr. Aligning of single and multiple wavelength chromatographic profiles for chemometric data analysis using correlation optimised warping. J Chromatogr A 1998; 805: 17–35.10.1016/S0021-9673(98)00021-1Search in Google Scholar

Okatani T, Deguchi K. On the Wiberg algorithm for matrix factorization in the presence of missing components. Int J Comput Vis 2007; 72: 329–337.10.1007/s11263-006-9785-5Search in Google Scholar

Oppenheim G, Philippe A, de Rigal J. The particle filters and their applications. Chemometr Intell Lab Syst 2008; 91: 87–93.10.1016/j.chemolab.2007.09.010Search in Google Scholar

Orfanidis SJ. Introduction to signal processing. NJ: Prentice Hall, 1996.Search in Google Scholar

Pearson R. Outliers in process modeling and identification. IEEE Trans Control Syst Technol 2002; 10: 55–63.10.1109/87.974338Search in Google Scholar

Pell RJ. Multiple outlier detection for multivariate calibration using robust statistical techniques. Chemometr Intell Lab Syst 2000; 52: 87–104.10.1016/S0169-7439(00)00082-4Search in Google Scholar

Peña D. Influential observations in time series. J Bus Econ Stat 1990; 8: 235–241.Search in Google Scholar

Pison G, Van Aelst S. Analyzing data with robust multivariate methods and diagnostic plots. In: Härdle W, Rónz B, editors. Compstat. Heidelberg: Physica-Verlag, 2002: 165–170.Search in Google Scholar

Prabhu AV, Edgar TF, Good R. Missing data estimation for run-to-run EWMA-controlled processes. Comput Chem Eng 2009; 33: 1861–1869.10.1016/j.compchemeng.2009.05.010Search in Google Scholar

Prakash J, Huang B, Shah SL. Recursive constrained state estimation using modified extended Kalman filter. Comput Chem Eng 2014; 65: 9–17.10.1016/j.compchemeng.2014.02.013Search in Google Scholar

Puwakkatiya-Kankanamage EH, Garca-Muñoz S, Biegler LT. An optimization-based undeflated PLS (OUPLS) method to handle missing data in the training set. J Chemom 2014; 28: 575–584.10.1002/cem.2618Search in Google Scholar

Qin JS. Process data analytics in the era of big data. AIChE J 2014; 60: 3092–3100.10.1002/aic.14523Search in Google Scholar

Qin SJ, Valle S, Piovoso MJ. On unifying multiblock analysis with application to decentralized process monitoring. J Chemom 2001; 15: 715–742.10.1002/cem.667Search in Google Scholar

Quinlan J. Induction of decision trees. Mach Learn 1986; 1: 81–106.10.1007/BF00116251Search in Google Scholar

Quinlan JR. C4.5: Programs for machine learning. Morgan Kaufmann Series in Machine Learning, 1st ed., San Mateo: Morgan kaufmann, 1993.Search in Google Scholar

Rabiner LR, Gold B. Theory and application of digital signal processing. Englewood Cliffs, N.J.: Prentice-Hall, Inc., 1975.Search in Google Scholar

Ramaswamy S, Rastogi R, Shim K. Efficient algorithms for mining outliers from large data sets. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, SIGMOD ’00. New York, NY, USA: ACM 2000; 427–438.10.1145/342009.335437Search in Google Scholar

Raymond MR, Roberts DM. A comparison of methods for treating incomplete data in selection research. Educ Psychol Meas 1987; 47: 13–26.10.1177/0013164487471002Search in Google Scholar

Ren X, Rad A, Chan P, Lo W. Online identification of continuous-time systems with unknown time delay. IEEE Trans Automat Contr 2005; 50: 1418–1422.10.1109/TAC.2005.854640Search in Google Scholar

Reynolds D. Gaussian mixture models. In: Encyclopedia of biometrics. New York, NY: Springer, 2009: 659–663.Search in Google Scholar

Richard JP. Time-delay systems: an overview of some recent advances and open problems. Automatica 2003; 39: 1667–1694.10.1016/S0005-1098(03)00167-5Search in Google Scholar

Roberts SJ. Novelty detection using extreme value statistics. Vision, Image Signal Proc 1999; 146: 124–129.10.1049/ip-vis:19990428Search in Google Scholar

Roberts S, Tarassenko L. A probabilistic resource allocating network for novelty detection. Neural Comput 1994; 6: 270–284.10.1162/neco.1994.6.2.270Search in Google Scholar

Rokach L, Maimon O. Data mining with decision trees: theory and applications. Series in Machine Perception and Artificial Intelligence– Vol. 81, 2nd ed., Singapore: World Scientific, 2014.10.1142/9097Search in Google Scholar

Rosenblatt F. Principles of neurodynamics, perceptrons and the theory of brain mechanisms, 1st ed., Washington, DC: Spartan Books, 1961.10.21236/AD0256582Search in Google Scholar

Roth PL. Missing data: a conceptual review for applied psychologists. Pers Psychol 1994; 47: 537–560.10.1111/j.1744-6570.1994.tb01736.xSearch in Google Scholar

Rousseeuw PJ. Least median of squares regression. J Am Stat Assoc 1984; 79: 871–880.10.1080/01621459.1984.10477105Search in Google Scholar

Rousseeuw PJ. Multivariate estimation with high breakdown point. Math Stat Appl 1985; B: 283–297.10.1007/978-94-009-5438-0_20Search in Google Scholar

Rousseeuw PJ, Leroy AM. Robust regression and outlier detection. Wiley series in probability and statistics, 3rd ed., Hoboken, New Jersey: John Wiley & Sons, Inc., 1996.Search in Google Scholar

Rousseeuw PJ, Van Driessen K. A Fast Algorithm for the minimum covariance determinant estimator. Technometrics 1999; 41: 212–223.10.1080/00401706.1999.10485670Search in Google Scholar

Rousseeuw P, Yohai V. Robust regression by means of S-estimators. In: Robust and nonlinear time series analysis. New York: Springer-Verlag, 1984: 256–272.Search in Google Scholar

Rubin DB. Multiple imputation for nonresponse in surveys. Wiley series in probability and mathematical statistics, 1st ed., New Jersey: John Wiley & Sons, Ltd., 1987.10.1002/9780470316696Search in Google Scholar

Russell S, Norvig P. Artificial intelligence: a modern approach, 3rd ed., NJ: Prentice Hall, 2009.Search in Google Scholar

Russell EL, Chiang LH, Braatz RD. Fault detection in industrial processes using canonical variate analysis and dynamic principal component analysis. Chemometr Intell Lab Syst 2000; 51: 81–93.10.1016/S0169-7439(00)00058-7Search in Google Scholar

Santos TL, Botura PE, Normey-Rico JE. Dealing with noise in unstable dead-time process control. J Process Control 2010; 20: 840–847.10.1016/j.jprocont.2010.05.003Search in Google Scholar

Savitzky A, Golay MJ. Smoothing and differentiation of data by simplified least squares procedures. Anal Chem 1964; 36: 1627–1639.10.1021/ac60214a047Search in Google Scholar

Schafer JL. Analysis of incomplete multivariate data. CRC monographs on statistics & applied probability, 1st ed., Florida: Chapman & Hall/CRC, 1997.Search in Google Scholar

Schafer JL, Graham JW. Missing data: our view of the state of the art. Pyschol Methods 2002; 7: 147–177.10.1037/1082-989X.7.2.147Search in Google Scholar

Schölkopf B, Smola A, Müller KR. Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput 1998; 10: 1299–1319.10.1162/089976698300017467Search in Google Scholar

Seborg DE, Mellichamp DA, Edgar TF, Doyle III FJ. Process dynamics and control, 3rd ed., New York, NY: John Wiley & Sons, 2010.Search in Google Scholar

Segovia VR, Hägglund T, Åström K. Measurement noise filtering for PID controllers. J Process Control 2014; 24: 299–313.10.1016/j.jprocont.2014.01.017Search in Google Scholar

Serneels S, Verdonck T. Principal component analysis for data containing outliers and missing elements. Comput Stat Data Anal 2008; 52: 1712–1727.10.1016/j.csda.2007.05.024Search in Google Scholar

Shekhar S, Lu CT, Zhang P. Detecting graph-based spatial outliers: algorithms and applications (a summary of results). In: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining. San Francisco, CA, USA; New York, NY: ACM, 2001; 371–376.Search in Google Scholar

Shen H, Nelson G, Kennedy S, Nelson D, Johnson J, Spiller D, White MR, Kell DB. Automatic tracking of biological cells and compartments using particle filters and active contours. Chemometr Intell Lab Syst 2006; 82: 276–282.10.1016/j.chemolab.2005.07.007Search in Google Scholar

Shubert E, Wojdanowski R, Kriegel HP, Zimek A. On evaluation of outlier rankings and outlier scores. In: Proceedings of 12th SIAM International Conference on Data Mining, Anaheim, CA, USA, 2012.10.1137/1.9781611972825.90Search in Google Scholar

Shum HY, Ikeuchi K, Reddy R. Principal component analysis with missing data and its application to polyhedral object modeling. IEEE Trans Pattern Anal Mach Intell 1995; 17: 854–867.10.1109/34.406651Search in Google Scholar

Silva-Ramírez EL, Pino-Mejas R, López-Coello M, de-la Vega MDC. Missing value imputation on missing completely at random data using multilayer perceptrons. Neural Netw 2011; 24: 121–129.10.1016/j.neunet.2010.09.008Search in Google Scholar

Singh A. Outliers and robust procedures in some chemometric applications. Chemometr Intell Lab Syst 1996; 33: 75–100.10.1016/0169-7439(95)00087-9Search in Google Scholar

Soderstrom T. Integration of on-line data reconciliation and bias identification techniques. PhD thesis, The University of Texas at Austin 2001.Search in Google Scholar

Soderstrom TA, Himmelblau DM, Edgar TF. A mixed integer optimization approach for simultaneous data reconciliation and identification of measurement bias. Control Eng Pract 2001; 9: 869–876.10.1016/S0967-0661(01)00056-9Search in Google Scholar

Tang J, Chen Z, chee Fu AW, Cheung D. A robust outlier detection scheme for large data sets. In: Cheng, Ming-shan, Yu, Philip S, Liu, Bing, editors. Proceedings of the 6th Pacific-Asia conference on advances in knowledge discovery and data mining, Hong Kong, China 2001; London, UK: Springer-Verlag, 2002, 6–8.Search in Google Scholar

Tang J, Chen Z, Fu AWC, Cheung DW. Enhancing effectiveness of outlier detections for low density patterns. In: Advances in knowledge discovery and data mining, Berlin, Heidelberg: Springer, 2002: 535–548.10.1007/3-540-47887-6_53Search in Google Scholar

Tax DMJ, Duin RPW. Support vector data description. Mach Learn 2004; 54: 45–66.10.1023/B:MACH.0000008084.60811.49Search in Google Scholar

Tham MT, Montague GA, Morris AJ, Lant PA. Soft-sensors for process estimation and inferential control. J Process Control 1991; 1: 3–14.10.1016/0959-1524(91)87002-FSearch in Google Scholar

Toprac AJ, Downey DJ, Gupta S. Run-to-run control process for controlling critical dimensions 1999. URL http://www.google.com/patents/US5926690. Accessed on October 20, 2014.Search in Google Scholar

Torr PHS, Murray DW. Outlier detection and motion segmentation. 1993; 2059: 432–443.10.1117/12.150246Search in Google Scholar

Tsay RS. Outliers, level shifts, and variance changes in time series. J Forecasting 1988; 7: 1–20.10.1002/for.3980070102Search in Google Scholar

Tsay RS, Peña D, Pankratz AE. Outliers in multivariate time series. Biometrika 2000; 87: 789–804.10.1093/biomet/87.4.789Search in Google Scholar

Tsikriktsis N. A review of techniques for treating missing data in OM survey research. J Oper Manag 2005; 24: 53–62.10.1016/j.jom.2005.03.001Search in Google Scholar

Tukey JW. Exploratory data analysis. Behavior science, 1st ed., London: Pearson, 1977.Search in Google Scholar

van Dyk DA, Meng XL. The art of data augmentation. J Comput Graph Stat 2001; 10: 1–50.10.1198/10618600152418584Search in Google Scholar

Vatanen T. Missing value imputation using subspace methods with applications on survey data. Master’s thesis, Aalto University, Espoo, Finland 2012.Search in Google Scholar

Vatanen T, Osmala M, Raiko T, Lagus K, Sysi-Aho M, Orešic M, Honkela T, hdesmä ki HL. Self-organization and missing values in SOM and GTM. Neurocomputing 2015; 147: 60–70.10.1016/j.neucom.2014.02.061Search in Google Scholar

Venkatasubramanian V. Drowning in data: informatics and modeling challenges in a data-rich networked world. AIChE J 2009; 55: 2–8.10.1002/aic.11756Search in Google Scholar

Verboven S, Hubert M. LIBRA: a MATLAB library for robust analysis. Chemometr Intell Lab Syst 2005; 75: 127–136.10.1016/j.chemolab.2004.06.003Search in Google Scholar

Vetterli M, Herley C. Wavelets and filter banks: theory and design. IEEE Trans Signal Proc 1992; 40: 2207–2232.10.1109/78.157221Search in Google Scholar

Walczak B, Massart D. Dealing with missing data: Part I. Chemometr Intell Lab Syst 2001a; 58: 15–27.10.1016/S0169-7439(01)00131-9Search in Google Scholar

Walczak B, Massart D. Dealing with missing data: Part II. Chemometr Intell Lab Syst 2001b; 58: 29–42.10.1016/S0169-7439(01)00132-0Search in Google Scholar

Wang J, He QP. A bayesian approach for disturbance detection and classification and its application to state estimation in run-to-run control. IEEE Trans Semiconduct Manufact 2007; 20: 126–136.10.1109/TSM.2007.895216Search in Google Scholar

Wang J, He QP. Multivariate statistical process monitoring based on statistics pattern analysis. Ind Eng Chem Res 2010; 49: 7858–7869.10.1021/ie901911pSearch in Google Scholar

Weber R. Measurement smoothing with a nonlinear exponential filter. AIChE J 1980; 26: 132–134.10.1002/aic.690260120Search in Google Scholar

Wentzell PD, Andrews DT, Hamilton DC, Faber K, Kowalski BR. Maximum likelihood principal component analysis. J Chemom 1997; 11: 339–366.10.1002/(SICI)1099-128X(199707)11:4<339::AID-CEM476>3.0.CO;2-LSearch in Google Scholar

Westerhuis JA, Kourti T, MacGregor JF. Analysis of multiblock and hierarchical PCA and PLS models. J Chemom 1998; 12: 301–321.10.1002/(SICI)1099-128X(199809/10)12:5<301::AID-CEM515>3.0.CO;2-SSearch in Google Scholar

Wettschereck D. A study of distance-based machine learning algorithms. PhD thesis, Department of Computer Science, Oregon State University, Corvallis 1994.Search in Google Scholar

Wiberg T. Computation of principal components when data are missing. In: Symposium of Computational Statistics. Berlin, Germany, 1976; 229–236.Search in Google Scholar

Wiegand P, Pell R, Comas E. Simultaneous variable selection and outlier detection using a robust genetic algorithm. Chemometr Intell Lab Syst 2009; 98: 108–114.10.1016/j.chemolab.2009.05.001Search in Google Scholar

Wiener N. Extrapolation, interpolation, and smoothing of stationary time series: with engineering applications, 1st ed., Cambridge, MA: MIT Press, 1964.Search in Google Scholar

Willems G, Pison G, Rousseeuw P, Van Aelst S. A hotelling test based on MCD. In: Härdle W, Rönz B, editors. Compstat. Heidelberg: Physica-Verlag, 2002: 117–122.Search in Google Scholar

Wise BM, Gallagher NB. Multivariate modeling of batch processes using summary variables. Technical report, Eigenvector Research, Inc., Wenatchee, WA, 2011.Search in Google Scholar

Xu S, Baldea M, Edgar TF, Wojsznis W, Blevins T, Nixon M. An improved methodology for outlier detection in dynamic datasets. AIChE J 2015; 61: 419–433.10.1002/aic.14631Search in Google Scholar

Yan X. Multivariate outlier detection based on self-organizing map and adaptive nonlinear map and its application. Chemometr Intell Lab Syst 2011; 107: 251–257.10.1016/j.chemolab.2011.04.007Search in Google Scholar

Yang ZJ, Hachino T, Tsuji T. On-line identification of continuous time-delay systems combining least-squares techniques with a genetic algorithm. Int J Control 1997; 66: 23–42.10.1080/002071797224801Search in Google Scholar

Yu J, Qin SJ. Multimode process monitoring with Bayesian inference-based finite Gaussian mixture models. AIChE J 2008; 54: 1811–1829.10.1002/aic.11515Search in Google Scholar

Zeng J, Gao C. Improvement of identification of blast furnace ironmaking process by outlier detection and missing value imputation. J Process Control 2009; 19: 1519–1528.10.1016/j.jprocont.2009.07.006Search in Google Scholar

Zhang Y, Abdulla WH. A comparative study of time-delay estimation techniques using microphone arrays. Technical Report 619, Department of Electrical and Computer Engineering, The University of Auckland 2005.Search in Google Scholar

Zhang Z, Chen J. Simultaneous data reconciliation and gross error detection for dynamic systems using particle filter and measurement test. Comput Chem Eng 2014; 69: 66–74.10.1016/j.compchemeng.2014.06.014Search in Google Scholar

Zhang Z, Dong F. Fault detection and diagnosis for missing data systems with a three time-slice dynamic Bayesian network approach. Chemometr Intell Lab Syst 2014; 138: 30–40.10.1016/j.chemolab.2014.07.009Search in Google Scholar

Zhang S, Qin Z, Ling CX, Sheng S. “Missing is useful”: missing values in cost-sensitive decision trees. IEEE Trans Knowl Data Eng 2005; 17: 1689–1693.10.1109/TKDE.2005.188Search in Google Scholar

Zhao Z, Huang B, Liu F. Bayesian method for state estimation of batch process with missing data. Comput Chem Eng 2013a; 53: 14–24.10.1016/j.compchemeng.2013.01.011Search in Google Scholar

Zhao Z, Huang B, Liu F. Parameter estimation in batch process using EM algorithm with particle filter. Comput Chem Eng 2013b; 57: 159–172.10.1016/j.compchemeng.2013.03.024Search in Google Scholar

Zhao Z, Li Q, Huang M, Liu F. Concurrent PLS-based process monitoring with incomplete input and quality measurements. Comput Chem Eng 2014; 67: 69–82.10.1016/j.compchemeng.2014.03.022Search in Google Scholar

Zhen D, Zhao HL, Gu F, Ball AD. Phase-compensation-based dynamic time warping for fault diagnosis using the motor current signal. Meas Sci Technol 2012; 23: 55601.10.1088/0957-0233/23/5/055601Search in Google Scholar

Zhou DH, Frank PM. A real-time estimation approach to time-varying time delay and parameters of NARX processes. Comput Chem Eng 2000; 23: 1763–1772.10.1016/S0098-1354(99)00325-7Search in Google Scholar

Zhou XY, Lim JS. Replace missing values with EM algorithm based on GMM and naive Bayesian. Int J Soft Eng Res Appl 2014; 8: 177–188.Search in Google Scholar

Zhou J, Luecke R. Estimation of the covariances of the process noise and measurement noise for a linear discrete dynamic system. Comput Chem Eng 1995; 19: 187–195.10.1016/0098-1354(94)E0046-PSearch in Google Scholar

Zhu J, Ge Z, Song Z. Robust modeling of mixture probabilistic principal component analysis and process monitoring application. AIChE J 2014; 60: 2143–2157.10.1002/aic.14419Search in Google Scholar

Zikopoulos PC, Eaton C, deRoos D, Deutsch T, Lapis G. Understanding big data: analytics for enterprise class hadoop and streaming data. New York: McGraw-Hill Osborne Media, 2011.Search in Google Scholar

Received: 2015-4-8
Accepted: 2015-8-12
Published Online: 2015-9-15
Published in Print: 2015-10-1

©2015 by De Gruyter

Downloaded on 25.4.2024 from https://www.degruyter.com/document/doi/10.1515/revce-2015-0022/html
Scroll to top button