Skip to main content

Multivariate Data Analysis (Chemometrics)

  • Chapter
  • First Online:

Part of the book series: Food Engineering Series ((FSES))

Abstract

Chemometrics plays a key role in PAT strategies. It is essential in understanding and diagnosing real-time processes, and keeping them under multivariate statistical control. This chapter will cover design of experiments, exploratory analysis, quantitative predictive modelling, classification, multivariate process monitoring and multi-block and multi-way analyses. The objective of the chapter is to describe chemometrics methods with a main focus on understanding, interpretation and evaluating the usefulness of the results.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  • Abrahamsson C, Johansson J et al (2003) Comparison of different variable selection methods conducted on NIR transmission measurements on intact tablets. Chemometr Intell Lab Syst 69:3–12

    CAS  Google Scholar 

  • Alsberg BK, Woodward AM et al (1997) An introduction to wavelet transforms for chemometricians: a time-frequency approach. Chemometr Intell Lab Syst 37(2):215–239

    CAS  Google Scholar 

  • Andersson CA (1999) Direct orthogonalization. Chemometr Intell Lab Syst 47(1):51–63

    CAS  Google Scholar 

  • Andrew A, Fearn T (2004) Transfer by orthogonal projection: making near-infrared calibrations robust to between-instrument variation. Chemometr Intell Lab Syst 72(1):51–56

    CAS  Google Scholar 

  • Ankerst M, Breunig MM et al (1999) OPTICS: ordering points to identify the clustering structure. ACM SIGMOD international conference on management of data (SIGMOD’99), Philadelphia

    Google Scholar 

  • Araujo PW, Brereton RG (1996a) Experimental design I. Screening. TrAC Trends Anal Chem 15(1):26–31

    Google Scholar 

  • Araujo PW, Brereton RG (1996b) Experimental design II. Optimization. TrAC Trends Anal Chem 15(2):63–70

    Google Scholar 

  • Azzouz T, Puigdoménech A et al (2003) Comparison between different data pretreatment methods in the analysis of forage samples using near-infrared diffuse reflectance spectroscopy and partial least-squares multivariate calibration method. Anal Chim Acta 484:121–134

    CAS  Google Scholar 

  • Barker M, Rayens W (2003) Partial least squares for discrimination. J Chemometr 17:166–173

    CAS  Google Scholar 

  • Barnes RJ, Dhanoa MS et al (1989) Standard normal variate transformation and de-trending of near-infrared diffuse reflectance spectra. Appl Spectrosc 43(5):772–777

    CAS  Google Scholar 

  • Berrueta LA, Alonso-Salces RM et al (2007) Supervised pattern recognition in food analysis. J Chromatogr A 1158(1–2):196–214

    CAS  Google Scholar 

  • Bouveresse E, Massart DL (1996) Improvement of the piecewise direct standardisation procedure for the transfer of NIR spectra for multivariate calibration. Chemometr Intell Lab Syst 32:201–213

    CAS  Google Scholar 

  • Bouveresse E, Hartmann C et al (1996) Standardization of near-infrared spectrometric instruments. Anal Chem 68(6):982–990

    CAS  Google Scholar 

  • Box GEP, Draper NR (1987) Empirical model-building and response surfaces. Wiley, New York

    Google Scholar 

  • Breiman L, Friedman JH et al (1984) Classification and regression trees. Wadsworth International Group, Belmont

    Google Scholar 

  • Bro R (1996) Multiway calibration, multilinear PLS. J Chemometr 10:47–61

    CAS  Google Scholar 

  • Bro R (1997) PARAFAC. Tutorial and applications. Chemometr Intell Lab Syst 38(2):149–171

    CAS  Google Scholar 

  • Bro R (1999) Exploratory study of sugar production using fluorescence spectroscopy and multi-way analysis. Chemometr Intell Lab Syst 46(2):133–147

    CAS  Google Scholar 

  • Bro R, Andersson CA et al (1999) PARAFAC2—Part II. Modeling chromatographic data with retention time shifts. J Chemometr 13(3-4):295–309

    CAS  Google Scholar 

  • Bry X, Verron T et al (2009) Exploring a physico-chemical multi-array explanatory model with a new multiple covariance-based technique: structural equation exploratory regression. Anal Chim Acta 642(1–2):45–58

    CAS  Google Scholar 

  • Burges CJC (1998) A tutorial on support vector machines for pattern recognition. Data Min Knowl Disc 2(2):121–167

    Google Scholar 

  • Carroll JD, Chang J (1970) Analysis of individual differences in multidimensional scaling via an N-way generalization of ‘Eckart-Young’ decomposition. Psychometrika 35:283–319

    Google Scholar 

  • Centner V, Massart D-L et al (1996) Elimination of uninformative variables for multivariate calibration. Anal Chem 68(21):3851–3858

    CAS  Google Scholar 

  • Chauchard F, Cogdill R et al (2004a) Application of LS-SVM to non-linear phenomena in NIR spectroscopy: development of a robust and portable sensor for acidity prediction in grapes. Chemometr Intell Lab Syst 71(2):141–150

    Google Scholar 

  • Chauchard F, Roger JM et al (2004b) Correction of the temperature effect on near infrared calibration—application to soluble solid content prediction. J Near Infrared Spectrosc 12:199–205

    Google Scholar 

  • Chessel D, Hanafi M (1996) Analyses de la co-inertie de K nuages de points. Revue de Statistique Appliquée XLIV(2):35–60

    Google Scholar 

  • Christiansen KF, Vegarud G et al (2004) Hydrolyzed whey proteins as emulsifiers and stabilizers in high-pressure processed dressings. Food Hydrocoll 18(5):757–767

    CAS  Google Scholar 

  • Clark RD (1997) OptiSim: an extended dissimilarity selection method for finding diverse representative subsets. J Chem Inf Comput Sci 37:1181–1188

    CAS  Google Scholar 

  • Cleveland WS, Devlin SJ (1988) Locally weighted regression: an approach to regression analysis by local fitting. J Am Statistical Assoc 83(403):596–610

    Google Scholar 

  • Cogdill RP, Dardenne P (2003) Least-squares support vector machines for chemometrics: an introduction and evaluation. J Near Infrared Spectrosc 12(2):93–100

    Google Scholar 

  • Cornell JA (1990) Experiments with mixtures. Wiley, New York

    Google Scholar 

  • Cortes C, Vapnik V (1995) Support vector networks. Mach Learning 20:273–297

    Google Scholar 

  • Dahl KS, Piovoso MJ et al (1999) Translating third-order data analysis methods to chemical batch processes. Chemometr Intell Lab Syst 46(2):161–180

    CAS  Google Scholar 

  • Dantas-Filho HA, Galvao RKH et al (2004) A strategy for selecting calibration samples for multivariate modelling. Chemometr Intell Lab Syst 72:83–91

    CAS  Google Scholar 

  • Daszykowski M, Walczak B et al (2001) Looking for natural patterns in data: part 1. Density-based approach. Chemometr Intell Lab Syst 56(2):83–92

    CAS  Google Scholar 

  • Daszykowski M, Kaczmarek K et al (2007) Robust statistics in data analysis—a review. Basic concepts. Chemometr Intell Lab Syst 85:203–219

    CAS  Google Scholar 

  • Daubechies I (1990) The wavelet transform, time-frequency localization and signal analysis. IEEE Trans Inform Theory 36(5):961–1005

    Google Scholar 

  • Davies AMC, Britcher HV et al (1988) The application of fourier-transformed near-infrared spectra to quantitative analysis by comparison of similarity indices (CARNAC). Microchim Acta 94(1–6):61–64

    Google Scholar 

  • De Belie N, Sivertsvik M et al (2003) Differences in chewing sounds of dry-crisp snacks by multivariate data analysis. J Sound Vib 266(3):625–643

    Google Scholar 

  • De Jong S (1993) SIMPLS: an alternative approach to partial least squares regression. Chemometr Intell Lab Syst 18(3):251–263

    CAS  Google Scholar 

  • de Juan A, Tauler R (2003) Chemometrics applied to unravel multicomponent processes and mixtures: revisiting latest trends in multivariate resolution. Anal Chim Acta 500(1-2):195–210

    Google Scholar 

  • de Juan A, Tauler R (2006) Multivariate curve resolution (MCR) from 2000: progress in concepts and applications. Crit Rev Anal Chem 36:163–176

    Google Scholar 

  • De Maesschalck R, Jouan-Rimbaud D et al (2000) The Mahalanobis distance. Chemometr Intell Lab Syst 50(1):1–18

    CAS  Google Scholar 

  • Devaux MF, Bertrand D et al (1988) Application of multidimensional analyses to the extraction of discriminant spectral patterns from NIR spectra. Appl Spectrosc 42(6):941–1132

    Google Scholar 

  • Devaux MF, Robert P et al (1993) Canonical correlation analysis of mid and near infrared oil spectra. Appl Spectrosc 47:1024–1028

    CAS  Google Scholar 

  • Ellekjær MR, Ilseng MA et al (1996) A case study of the use of experimental design and multivariate analysis in product improvement. Food Qual Prefer 7(1):29–36

    Google Scholar 

  • Eriksson L, Johansson E et al (1998) Mixture design-design generation, PLS analysis, and model usage. Chemometr Intell Lab Syst 43(1–2):1–24

    CAS  Google Scholar 

  • Ester M, Kriegel H-P et al (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. 2nd international conference on knowledge discovery and data mining

    Google Scholar 

  • Fearn T (2001) Standardisation and calibration transfer for near infrared instrument: a review. J Near Infrared Spectrosc 9:229–244

    CAS  Google Scholar 

  • Felipe-Sotelo M, Gustems L et al (2006) Investigation of geographical and temporal distribution of tropospheric ozone in Catalonia (North-East Spain) during the period 2000–2004 using multivariate data analysis methods. Atmos Environ 40(38):7421–7436

    CAS  Google Scholar 

  • Fernández Pierna JA, Wahl F et al (2002) Methods for outlier detection in prediction. Chemometr Intell Lab Syst 63(1):27–39

    Google Scholar 

  • Fernández Pierna JA, Baeten V et al (2006) Screening of compound feeds using NIR hyperspectral data. Chemometr Intell Lab Syst 84(1–2):114–118

    Google Scholar 

  • Fernández-Ibáñez V, Fearn T et al (2010) Development and validation of near infrared microscopy spectral libraries of ingredients in animal feed as a first step to adopting traceability and authenticity as guarantors of food safety. Food Chemistry. (In press, corrected proof).

    Google Scholar 

  • Ferreira SLC, Santos WNL dos et al (2004) Doehlert matrix: a chemometric tool for analytical chemistry-review. Talanta 63(4):1061–1067

    CAS  Google Scholar 

  • Ferreira SLC, Bruns RE et al (2007) Box-Behnken design: an alternative for the optimization of analytical methods. Anal Chim Acta 597(2):179–186

    CAS  Google Scholar 

  • Feudale RN, Tan H et al (2002a) Piecewise orthogonal signal correction. Chemometr Intell Lab Syst 63:129–138

    Google Scholar 

  • Feudale RN, Woody NA et al (2002b) Transfer of multivariate calibration models: a review. Chemometr Intell Lab Syst 64(2):181–192

    Google Scholar 

  • Fisher R (1936) The use of multiple measurements in taxonomic problems. Annals Eugenics 7:179–188

    Google Scholar 

  • Gacula MC, Jagbir Singh JR (1984) Statistical methods in food and consumer research. Academic, New York

    Google Scholar 

  • Galvao RKH, Araujo MCU et al (2005) A method for calibration and validation subset partitioning. Talanta 67:736–740

    CAS  Google Scholar 

  • Garcia-Munoz S, Kourti T et al (2003) Troubleshooting of an industrial batch process using multivariate methods. Ind Eng Chem Res 42:3592–3601

    CAS  Google Scholar 

  • Geladi P, MacDougall D et al (1985) Linearization and scatter-correction for near-infrared reflectance spectra of meat. Appl Spectrosc 39(3):491–500

    Google Scholar 

  • Ghosh J, Turner K (1994) Structural adaptation and generalization in supervised feed-forward networks. J Artificial Neural Netw 1(4):431–458

    Google Scholar 

  • Gnanadesikan R, Kettenring JR (1972) Robust estimates, residuals, and outlier detection with multiresponse data. Biometrics 28:81–124

    Google Scholar 

  • Goodacre R, Kell DB et al (1993) Rapid assessment of the adulteration of virgin olive oils by other seed oils using pyrolysis mass spectrometry and artificial neural networks. J Sci Food Agric 63:297–307

    CAS  Google Scholar 

  • Halkidi M, Batistakis Y et al (2002a) Cluster validity methods: part I. ACM SIGMOD Rec 31(2):40–45

    Google Scholar 

  • Halkidi M, Batistakis Y et al (2002b) Clustering validity checking methods: part II. ACM SIGMOD Rec 31(3):19–27

    Google Scholar 

  • Harshman RA (1970) Foundations of the PARAFAC procedure. UCLA Working Papers in Phonetics 16:1–84.

    Google Scholar 

  • Hart P (1967). Nearest neighbour pattern classifcation. IEEE Trans Inform Theory 13(1):21–27

    Google Scholar 

  • Hotelling H (1936). Relations between two sets of variants. Biometrika 28:321–377

    Google Scholar 

  • Igne B, Roger J-M et al (2009) Improving the transfer of near infrared prediction models by orthogonal methods. Chemometr Intell Lab Syst 99(1):57–65

    CAS  Google Scholar 

  • Jain AK (2009). Data clustering: 50 years beyond K-means. Pattern Recognit Lett. (In press, corrected proof)

    Google Scholar 

  • Jain AK, Murty MN et al (1999) Data clustering: a review. ACM Comput Surv 31(3):264–323

    Google Scholar 

  • Jiang J-H, Liang Y et al (2004) Principles and methodologies in self-modeling curve resolution. Chemometr Intell Lab Syst 71(1):1–12

    CAS  Google Scholar 

  • Johnson SC (1966) Hierarchical clustering schemes. Psychometrika 32(3):241–254

    Google Scholar 

  • Kassidas A, MacGregor JF et al (1998) Synchronization of batch trajectories using dynamic time warping. AlChE J 44:864–875

    CAS  Google Scholar 

  • Kennard RW, Stone LA (1969) Computer aided design of experiments. Technometrics 11(1):137–148

    Google Scholar 

  • Kiang MY (2001) Extending the Kohonen self-organizing map networks for clustering analysis. Comput Stat Data Anal 38(2):161–180

    Google Scholar 

  • Kiers HAL, Berge JMFt et al (1999) PARAFAC2—part I. A direct fitting algorithm for the PARAFAC2 model. J Chemometr 13(3–4):275–294

    CAS  Google Scholar 

  • Kohler A, Skaga A et al (2002) Sorting salted cod fillets by computer vision: a pilot study. Comput Electr Agric 36(1):3–16

    Google Scholar 

  • Kohonen T (1990) The self organizing map. Proceedings of the IEEE 78(9):1464–1480

    Google Scholar 

  • Kohonen T (1998) The self-organizing map. Neurocomputing 21(1–3):1–6

    Google Scholar 

  • Kourti T (2006) Process analytical technology beyond real-time analyzers: the role of multivariate analysis. Crit Rev Anal Chem 36:257–278

    CAS  Google Scholar 

  • Lawton WH, Sylvestre EA (1971) Self modeling curve resolution. Technometrics 13:617–633

    Google Scholar 

  • Leardi R (2009) Experimental design in chemistry: a tutorial. Anal Chim Acta 652(1–2):161–172

    CAS  Google Scholar 

  • Leardi R, González AL (1998) Genetic algorithms applied to feature selection in PLS regression: how and when to use them. Chemometr Intell Lab Syst 41(2):195–207

    CAS  Google Scholar 

  • Liang Y-Z, Kvalheim OM (1996) Robust methods for multivariate analysis—a tutorial review. Chemometr Intell Lab Syst 32(1):1–10

    CAS  Google Scholar 

  • Lima FSG, Borges LEP (2002) Evaluation of standardisation methods of near infrared calibration models. J Near Infrared Spectrosc 10(4):269–278

    CAS  Google Scholar 

  • Lundstedt T, Seifert E et al (1998) Experimental design and optimization. Chemometr Intell Lab Syst 42(1–2):3–40

    CAS  Google Scholar 

  • MacGregor J, Yu H et al (2005) Data-based latent variable methods for process analysis, monitoring and control. Comput Chem Eng 29(6):1217–1223

    CAS  Google Scholar 

  • MacQueen J (1967) Some methods for classification and analysis of multivariate observations. Fifth Berkeley symposium on mathematical statistics and probability

    Google Scholar 

  • Marbach R (2005) A new method for multiariate calibration. J Near Infrared Spectrosc 13:241–254

    CAS  Google Scholar 

  • Marbach R (2007a) Multivariate calibration: a science-based method—part 1. Pharmaceutical Manufacturing 6(1):42–47

    Google Scholar 

  • Marbach R (2007b) Multivariate calibration: a science-based method—part 2. Pharmaceutical Manufacturing 6(2):44–47

    Google Scholar 

  • Martens H, Stark E (2001) Extended multiplicative signal correction and spectral interference subtraction: new preprocessing methods for near infrared spectroscopy. J Pharm Biomed Anal 9(8):625–635

    Google Scholar 

  • Martens H, Anderssen E et al (2005) Regression of a data matrix on descriptors of both its rows and of its columns via latent variables: L-PLSR. Comput Stat Data Anal 48(1):103–123

    Google Scholar 

  • Mazerolles G, Hanafi M et al (2006) Common components and specific weights analysis: a chemometric method for dealing with complexity of food products. Chemometr Intell Lab Syst 81(1):41–49

    CAS  Google Scholar 

  • McClure WF, Norris KH et al (1977) Rapid spectrophotometric analysis of the chemical composition of tobacco. Part 1. Total reducing sugars. Beitr Tabakforsch 9(1):13–18

    CAS  Google Scholar 

  • Montgomery DC (1997) Multivariate quality control. Introduction to statistical quality control. John Wiley & Sons Inc, New York, pp 360–373

    Google Scholar 

  • Navea S, Tauler R et al (2006) Monitoring and modelling of protein processes using mass spectrometry, circular dichroism and multivariate curve resolution methods. Anal Chem 78:4768–4778

    CAS  Google Scholar 

  • Nomikos P, MacGregor JF (1995a) Multi-way partial least squares in monitoring batch processes. Chemometrics Intelligent Laboratory Systems 30(1):97–108

    Google Scholar 

  • Nomikos P, MacGregor JF (1995b) Multivariate SPC charts for monitoring batch processes. Technometrics 37:41–59

    Google Scholar 

  • Nørgaard L, Saudland A et al (2000) Interval partial least-squares regression (iPLS): a comparative chemometric study with an example from near-infrared spectroscopy. Appl Spectrosc 54(3):413–419

    Google Scholar 

  • Norris KH, Williams PC (1984) Optimization of mathematical treatments of raw near-infrared signal in the measurement of protein in hard red spring wheat: I. Influence of particle size. Cereal Chem 61:158–165

    CAS  Google Scholar 

  • Novales B, Guillaume S et al (1998) Particle size characterisation of in-flow milling products by video image analysis using global features. J Sci Food Agric 78(2):187–195

    CAS  Google Scholar 

  • Olivieri A, Faber NM et al (2006) Guidelines for calibration in analytical chemistry. Part 3: uncertainty estimation and figures of merit for multivariate calibration. Pure Appl Chem 78(3):633–661

    CAS  Google Scholar 

  • Ortiz MC, Sarabia L (2007) Quantitative determination in chromatographic analysis based on n-way calibration strategies. J Chromatogr A 1158(1–2):94–110

    CAS  Google Scholar 

  • Plackett RL, Burman JP (1946) The design of optimal multifactorial experiments. Biometrika 33:305–325

    Google Scholar 

  • Preys S, Roger JM et al (2008) Robust calibration using orthogonal projection and experimental design. Application to the correction of the light scattering effect on turbid NIR spectra. Chemometr Intell Lab Syst 91:28–33

    CAS  Google Scholar 

  • Qannari EM, Wakeling I et al (2000) Defining the underlying sensory dimensions. Food Qual Prefer 11(1–2):151–154

    Google Scholar 

  • Qin SJ, Valle S et al (2001) On unifying multiblock analysis with application to decentralized process monitoring. J Chemometr 15:715–742

    CAS  Google Scholar 

  • Rechtschaffner RL (1967) Saturated fractions of 2n and 3n fractional designs. Technometrics 9:569–575

    Google Scholar 

  • Rinnan A, Berg Fvd et al (2009) Review of the most common pre-processing techniques for near-infrared spectra. TrAC Trends Analytical Chem 28(10):1201–1222

    CAS  Google Scholar 

  • Roger J-M, Chauchard F et al (2003) EPO-PLS external parameter orthogonalisation of PLS application to temperature-independent measurement of sugar content of intact fruits. Chemometr Intell Lab Syst 66(2):191–204

    CAS  Google Scholar 

  • Rouillé J, Le Bail A et al (2000) Influence of formulation and mixing conditions on breadmaking qualities of French frozen dough. J Food Eng 43(4):197–203

    Google Scholar 

  • Rousseeuw PJ (1984) Least median of squares regression. J Am Stat Assoc 79:871–880

    Google Scholar 

  • Roussel SA, Hardy CL et al (2001) Detection of Roundup Ready™ soybeans by near-infrared spectroscopy. Appl Spectrosc 55(10):1425–1430

    CAS  Google Scholar 

  • Roussel S, Bellon-Maurel V et al (2003) Authenticating white grape must variety with classification models based on aroma sensors, FT-IR and UV spectrometry. J Food Eng 60(4):407–419

    Google Scholar 

  • Rumelhart DE, Hinton GE et al (1986) Learning internal representations by error propagation. Parallel Distributed Processing, Cambridge

    Google Scholar 

  • Savitzky A, Golay MJE (1964) Smoothing and differentiation of data by simplified least squares procedures. Anal Chem 36(8):1627–1639

    CAS  Google Scholar 

  • Serrano-Megías M, López-Nicolás JM (2006) Application of agglomerative hierarchical clustering to identify consumer tomato preferences: influence of physicochemical and sensory characteristics on consumer response. J Sci Food Agric 86(4):493–499

    Google Scholar 

  • Shenk JS, Westerhaus MO et al (1998) Investigation of a LOCAL calibration procedure for near infrared instruments. J Near Infrared Spectrosc 5(4):223–232

    Google Scholar 

  • Sirieix A, Downey G (1993) Commercial wheatflour authentication by discriminant analysis of near infrared reflectance spectra. J Near Infrared Spectrosc 1:187–197

    CAS  Google Scholar 

  • Snee RD (1977) Validation of regression models: methods and examples. Technometrics 19(4):415–428

    Google Scholar 

  • Svensson O, Kourti T et al (2002) An investigation of orthogonal signal correction algorithms and their characteristics. J Chemometr 16:176–188

    CAS  Google Scholar 

  • Tauler R (1995) Multivariate curve resolution applied to second order data. Chemometr Intell Lab Syst 30(1):133–146

    CAS  Google Scholar 

  • Thimm G, Fiesler E (1997) Pruning of neural networks. I.-R. R. 97-03. Valais, Switzerland, Dalle Molle Institute for perceptive artificial intelligence

    Google Scholar 

  • Tillmann P, Paul C (1998) The repeatability file-a tool for reducing the sensitivity of near infrared spectroscopy calibrations to moisture variation. J Near Infrared Spectrosc 6(1):61–68

    Google Scholar 

  • Tokatli F, Cinar A et al (2005) HACCP with multivariate process monitoring and fault diagnosis techniques: application to a food pasteurization process. Food Control 16(5):411–422

    Google Scholar 

  • Tracy ND, Young JC et al (1992) Multivariate control charts for individual observations. J Qual Technol 24:88–95

    Google Scholar 

  • Trygg J, Wold S (2002) Orthogonal projections to latent structures (O-PLS). J Chemometr 16:119–128

    CAS  Google Scholar 

  • Tucker LR (1966) Some mathematical notes on three-mode factor analysis. Psychometrika 31:279–311

    CAS  Google Scholar 

  • van Sprang ENM, Ramaker H-J et al (2002) Critical evaluation of approaches for on-line batch process monitoring. Chem Eng Sci 57(18):3979–3991

    Google Scholar 

  • Wang Y, Veltkamp DJ et al (1991) Multivariate instrument standardization. Anal Chem 63(23):2750–2756

    CAS  Google Scholar 

  • Wangen LE, Kowalski BR (1988) A multiblock partial least squares algorithm for investigating complex chemical systems. J Chemometr 3:3–20

    CAS  Google Scholar 

  • Weigend AS, Huberman BA et al (1990) Predicting the future: a connectionist approach. Int J Neural Syst 1(3):193–209

    Google Scholar 

  • Westad F, Schmidt A et al (2008) Incorporating chemical band-assignment in near infrared spectroscopy regression models. J Near Infrared Spectrosc 16:265–273

    CAS  Google Scholar 

  • Westerhaus MO (1991) Improving repeatability of calibrations across instruments. 3rd International conference on near infrared spectroscopy, Gembloux, Belgium

    Google Scholar 

  • Westerhuis JA, Kourti T et al (1998) Analysis of multiblock and hierarchical PCA and PLS models. J Chemometr 12:301–321

    CAS  Google Scholar 

  • Westerhuis JA, De Jong S et al (2001) Direct orthogonal signal correction. Chemometr Intell Lab Syst 56:13–25

    CAS  Google Scholar 

  • Wold H (1982) Soft modelling: the basic design and some extensions. System under indirect observation, vol 2. (H Wold, KG Jöreskog (eds)). Amsterdam, North Holland, pp 1–54

    Google Scholar 

  • Wold S (1992) Nonlinear partial least squares modelling II. Spline inner relation. Chemometr Intell Lab Syst 14(1–3):71–84

    CAS  Google Scholar 

  • Wold S, Sjostrom M (1977) SIMCA: a method for analyzing chemical data in terms of similarity and analogy—(Chapter Book). Chemometr Theory Appl 52:243–282

    CAS  Google Scholar 

  • Wold S, Martens H et al (1983) The multivariate calibration problem in chemistry solved by the PLS method. Matrix Pencils. Springer, Heidenberg

    Google Scholar 

  • Wold S, Martens H, & Wold H (1984) In S. Wold (Ed.), Muldast Proceedings, Technical Report, Research Group for Chemometrics, Umeå University, Sweden

    Google Scholar 

  • Wold S, Kettaneh-Wold N et al (1989) Nonlinear PLS modeling. Chemometr Intell Lab Syst 7(1–2):53–65

    CAS  Google Scholar 

  • Wold S, Kettaneh N et al (1996) Hierarchical multiblock PLS and PC models for easier model interpretation and as an alternative to variable selection. J Chemometr 10(5-6):463–482

    CAS  Google Scholar 

  • Wold S, Antti H et al (1998a) Orthogonal signal correction of near-infrared spectra. Chemometr Intell Lab Syst 44(1–2):175–185

    Google Scholar 

  • Wold S, Kettaneh N et al (1998b) Modelling and diagnostics of batch processes and analogous kinetic experiments. Chemometr Intell Lab Syst 44(1–2):331–340

    Google Scholar 

  • Wold S, Cheney J et al (2006) The chernometric analysis of point and dynamic data in pharmaceutical and biotech production (PAT)—some objectives and approaches. Chemometr Intell Lab Syst 84(1–2):159–163

    CAS  Google Scholar 

  • Wu W, Walczak B et al (1996) Feature reduction by Fourier transform in pattern recognition of NIR data. Anal Chim Acta 331:75–83

    CAS  Google Scholar 

  • Wülfert F, Kok WT et al (2000a) Correction of Temperature-Induced Spectral Variation by Continuous Piecewise Direct Standardization. Anal Chem 72:1639–1644

    Google Scholar 

  • Wülfert F, Kok WT et al (2000b) Linear techniques to correct for temperature-induced spectral variation in multivariate calibration. Chemometr Intell Lab Syst 51:189–200

    Google Scholar 

  • Zeaiter M, Roger JM et al (2005) Robustness of models developed by multivariate calibration. Part II: The influence of pre-processing methods. TrAC Trends Anal Chem 24(5):437–445

    CAS  Google Scholar 

  • Zeaiter M, Roger JM et al (2006) Dynamic orthogonal projection. A new method to maintain the on-line robustness of multivariate calibrations. Application to NIR-based monitoring of wine fermentations. Chemom Intell Lab Syst 80(2):227–235

    CAS  Google Scholar 

  • Zhu Y, Fearn T et al (2008) Error removal by orthogonal subtraction (EROS): a customised pretreatment for spectroscopic data. J Chemometr 22:130–134

    CAS  Google Scholar 

Download references

Acknowledgments

The authors want to thank Dr. Mazerolles from INRA for his multi-block section review, Dr. Williams from the Canadian Grain Commission, CAMO (Oslo, Norway) and Dr. Guillaume from Cemagref for authorising the usage of their data.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sylvie Roussel .

Editor information

Editors and Affiliations

Annex: Figures of Merit

Annex: Figures of Merit

Some statistical criteria (Figures of Merit) encountered during this chapter are mathematically defined here.

Root-mean-squared error (RMSE) is defined as the root square of the ratio of the prediction error sum of squares (PRESS) and the estimated degrees of freedom of the set. This square root value is generally preferred because the unit is the same as the original data. RMSE is calculated for a calibration set (RMSEC) and a prediction (or validation) set (RMSEP):

$$ \text{PRESS}=\sum\limits_{i=1}^{n}{( {{y}_{i}}-{{{\hat{y}}}_{i}} ){}^\text{2}} $$
(2.31)
$$ \text{RMSEC}=\sqrt{\frac{\text{PRESS}}{n-p-1}} $$
(2.32)
$$ \text{RMSEP}=\sqrt{\frac{\text{PRESS}}{n}}. $$
(2.33)

The bias is the mean error, i.e. the systematic part of the error:

$$ bs=\frac{1}{n}\sum\limits_{i=1}^{n}{( {{y}_{i}}-{{{\hat{y}}}_{i}} )}. $$
(2.34)

The coefficient of determination (R 2 ) represents the spread of the predictions. It is important not to consider it alone. For example, the R 2 value could be almost 1 whereas the bias and/or the PRESS could be high:

$$ R{}^\text{2}=1-\frac{\text{PRESS}}{\sum\limits_{i=1}^{n}{( {{y}_{i}}-\bar{y} ){}^\text{2}}}. $$
(2.35)

In classification, standard errors are defined as the proportion of misclassified samples. A confusion matrix is generally built to summarise the results. The number of correctly classified objects corresponds to n 11 and n 22, and the misclassified ones to n 21 and n 12. It can also be seen in terms of first-order errors, which are similar to the lack of sensitivity (e.g. the proportion of samples A not classified in A), or second order, which represents the lack of specificity (e.g. the proportion of B classified in A) (Table 2.4).

Table 2.4 Confusion matrix

It is important to note that some other figures of merit are widely used in certain applications and can be found in Olivieri et al. (2006).

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer Science+Business Media, New York

About this chapter

Cite this chapter

Roussel, S., Preys, S., Chauchard, F., Lallemand, J. (2014). Multivariate Data Analysis (Chemometrics). In: O'Donnell, C., Fagan, C., Cullen, P. (eds) Process Analytical Technology for the Food Industry. Food Engineering Series. Springer, New York, NY. https://doi.org/10.1007/978-1-4939-0311-5_2

Download citation

Publish with us

Policies and ethics