Dedicated to Edmund R. Malinowski. Secondary, model‐based examination of model‐free analysis results: Making the most of soft‐modelling outcomes

Edmund R. Malinowski is well known for his factor analysis‐based work; he is clearly less well known for his chemical model‐based analyses of chemical data. In this contribution, we discuss the, at the time innovative, idea of subjecting the primary model‐free analysis results to a secondary quantitative model‐based evaluation. Two examples from his research serve as illustrations: the complexation of Cu2+ with cyclodextrin and the dimerisation and trimerisation of methylene blue. In both examples, the spectrophotometric titration data are first analysed by window factor analysis, WFA, resulting in the concentration profiles. These primary, model‐free results allowed Malinowski to gain qualitative insight in the chemical processes. Additional quantitative analyses of the concentration profiles, based on the previously obtained reaction model, resulted in the numerical values of the underlying equilibrium constants. This contribution relates and compares this methodology with alternative ways of analysing the same data.

concentration profiles of all reacting and absorbing species, and a matrix A, which contains, row-wise, the molar absorption spectra of these species. This is best written as a matrix equation: R is the matrix of residuals, the difference between the real, noisy data and the re-calculated product CA. In many instances, this decomposition is not unique, and a variety of methods allow the determination of the range of feasible solutions. 5,6 A feasible solution encompasses a set concentration profiles in C and absorption spectra in A that satisfy Equation (1) as well as additional constraints such as positivity for all elements of C and A. If accurate concentration window information is available, obtained, for example, via EFA 2 or WFA, 3 and suitable conditions of rank overlap among the components are met, 7 the decomposition is often unique. This allows further interpretation of the results: The absorption spectra can be compared with known spectra, possibly resulting in chemical identification of the species. Another additional analysis, this time of the concentration profiles, has been pioneered by Malinowski. It is worth stressing that non-unique decompositions are hardly amenable to such additional analyses.
Accurate concentration profiles invite further analyses by the chemist interested in the fundamentals of the process under investigation. First, there can be a qualitative interpretation; that is, concentration profiles can strongly suggest a certain reaction mechanism in kinetics, or a sequence of equilibrium species formed during the titration in an equilibrium study. Once the chemical model is established, its underlying constants, rate and/or equilibrium constants, can be determined quantitatively from the concentration profiles, which were obtained automatically, without any chemical insight. Edmund R. Malinowski has pioneered such secondary qualitative and quantitative analyses. We will in the following re-visit two typical examples.
2.1 | Example 1: Determination of the equilibrium constants for the formation of a di-nuclear complex of Cu 2+ with cyclo-dextrin, CD, using edta as an auxiliary ligand or titrant In aqueous solution, Cu 2+ interacts with cyclo-dextrin, CD, to form the complex Cu 2 (CD). 8 The aim of the investigation is the determination of the equilibrium constant. A simplified model or list of the chemical reactions is given below. Several protonation equilibria are omitted in this list; however, as the pH was kept constant, these protonation equilibria can be formally ignored, and the analysis results in effective equilibrium constants. Ignoring the protonation equilibria will not invalidate the argumentation presented here.
The authors reduced the above system of equilibria and formulate an overall equilibrium, which is approximately described below, again omitting several protonation equilibria.
The equilibrium constant K 3 for the interaction of Cu 2+ with edta is known, K 3 = 1.5810 21 . 9 This simplified system of equilibria has only one unknown equilibrium constant K 12 .
The measurement consists of a collection of 11 solutions with the total concentrations [Cu 2+ ] tot = 0.002 M, [CD] tot = 0.001 M, and [edta] tot = 0 M to 0.002 M. Applying WFA to the data matrix resulted in a concentration matrix C, which was normalised to the known total concentration. C is shown in Figure 1.
For each solution during the titration (11 solutions in total), the determined concentrations of the three species allows the computation of the unknown equilibrium constant K 12 . Excluding the first and last solution, in theory, nine values for the two equilibrium constant can be computed. It turns out that computations for the different solutions result in substantially different values. It is reasonably obvious that the concentrations for the first few solutions are inconsistent and were not used for the calculations. The reported values for K 12 for the solutions 0.008 to 0.0016 M edta are 5.82 Â 10 3 , 2.43 Â 10 3 , 2.12 Â 10 3 , 2.36 Â 10 3 , and 5.26 Â 10 3 . The results for the different sets of concentrations should be weighted appropriately; this is however far from trivial and has not been performed by the authors.
The authors proposed an improved analysis method. Rather than calculating individual constants for each solution, the complete WFA concentration profiles are compared with a calculated set of concentration profiles, based on guessed equilibrium constants. This is shown in Figure 2.  Equation (4), together with Figure 3, illustrates the process of determining the optimal value for the equilibrium constant K 12 .
C WFA , produced by the WFA analysis, is a column matrix with 11 rows, the number of solutions, and three columns, the number of species. The matrix C(K 12 ) of concentration profiles has the same dimensions and is a function of K 12 and the known total concentrations. Both matrices are shown in Figure 2. The matrix R of residuals is the difference between C WFA and C(K 12 ). The sum of squares, ssq, over all elements of R represents the quality of the fit. Figure 3 illustrates the determination of the optimal value for K 12 , resulting in the minimal sum of squares. The optimal value is K 12 = 3.16 Â 10 3 .

| Example 2: Dimerisation and trimerisation of methylene blue in aqueous solution
A similar, detailed approach has been taken for the quantitative analysis of the dimerisation and trimerisation of methylene blue, MB, in aqueous solution. 10 The following two equilibria are used to describe the process: After optimising the two equilibrium constants, attempting at getting the optimal coincidence between the WFA profiles and the re-calculated concentration profiles C, systematic errors remain. Figure 4 demonstrates that the results are not quite adequate. Investigation of alternative possibilities resulted in an improved set of equilibria where the trimer is a more complex structure, which includes a chloride ion; see Equation (6). Such an ion-pair is to be expected as the trimer carries a high 3+ charge, which is partially neutralised by the chloride ion.
The fit with the improved model is clearly better; see Figure 5. This example demonstrates an interesting and powerful feature of the model-free analysis: Its results can guide the researcher to test a given model and develop alternative models if required.  We can summarise the Malinowski approach using his own words (related to Example 1) 7 : "The windows were initially determined by AUTOWFA 11 and further refined by visual inspection. The windows, expressed as solution numbers corresponding to data columns, were determined to be the following: (1) 1-8, (2) 2-10, and (3) 6-11. The concentration profiles extracted by WFA are portrayed in Fig. 4. It is important to recognise that these profiles are independent of any model or speculation concerning the species and their controlling equilibria. Interpretation and modelling of the profiles are performed after extraction and not before extraction, as required by classical methods."

| CONCLUSIONS; ARE THERE FURTHER IMPROVEMENTS?
The two papers by Edmund Malinowski we have discussed in the previous sections were published some 20 years ago. The obvious question is, were there further developments in the meantime? As we will see, the answer is a partial 'yes', there are indeed improvements, but some were developed essentially at the same time while others were already known at the time.
First, a comparable, parallel development published at a similar time incorporated the hard-modelling into the iterative, ALS, refinement of otherwise soft-modelling analyses. Some of the concentration profiles were subject to hard-modelling, the remaining ones were left to refinement by the ALS algorithm. 12,13 The difference between the two approaches are subtle but significant. In the Malinowski method, the WFA primary computation is explicit, and the secondary iterative refinement of the parameters is completely independent from the WFA analysis. In the alternative approach, the model-free and model-based parts are intimately intermingled in the iterative ALS process. Some of the concentration profiles, that is, columns of C, are quantitatively defined by a hard-model and its present parameters, while the remaining concentration profiles are only restricted by the ALS constraints. In the iterative ALS cycle, the hard-model parameters as well as the complete remaining concentration columns are refined, hopefully with reliable convergence. These additional freedoms allow virtually unlimited variability, covering many more systems, at the cost of reduced robustness. The Malinowski approach is simpler, thus more robust, but also more limited.
In order to compare with the pure hard-modelling approach, let us start with Equation (4). Two different calculations are required prior to the computation of the residuals: first, the computation of the matrix of WFA profiles, which is a model-free process as there is no chemistry involved; second, the computation of the concentration matrix C, which requires the guess or determination of a chemical reaction model with appropriate constants, followed by the actual calculation of C based on the law of mass action. The difference between the two computation results, the residuals, determines the sum of squares, ssq, which can be refined iteratively.
A simpler, that is, more direct way is described in the following. First, calculate C as before. Instead of relating C to the WFA profiles, relate it directly to the original data matrix D by computing the best matrix A of molar absorptivities for the present C. We denote this matrix asÂ. This is a linear regression and thus an explicit calculation: C + is the pseudo-invers of C; it can be computed as shown in Equation (7). However, the Matlab notationÂ = C\D is using a numerically superior algorithm. Having an estimated C and a correspondingÂ, we can compute an estimated data matrix D est as The difference between D and D est defines a matrix R of residuals and subsequently a sum of squares ssq: It is crucially important to recognise that D est and subsequently R as well as ssq are a function of only the constant(s) K, which define C.
The iterative refinement of the parameters K can be essentially identical with the algorithm used in the Malinowski approach. This procedure bypasses the WFA calculations. Recall, these WFA (see the note above) calculations are not completely automatic, requiring user input and thus can be error prone.
Model-based data fitting, also called hard-modelling, is much older that the soft-modelling approaches. It is well established in experimental chemistry, and there are many useful publications. [14][15][16][17][18] Should we forget WFA and all the other similar model-free analyses? Of course not! They are very important and powerful tools in the chemists' and chemometricians' toolbox. In many instances, there is no chemical model that allows a quantitative analysis in terms of some constants, then model-free analyses are the only option. Additionally, if there is a chemical model, model-free analyses are very powerful tools for the determination or at least the development of the correct chemical model.
Before finishing this paper, we would like to point out some additional pioneering aspects of Edmund R. Malinowski's efforts to improve chemistry based analysis of measured data. (a) In the application of the law of mass action, he based the equations on activities rather than the far more common approximation of using concentrations. 7,9 Admittedly, this is far more common in the chemical engineering world that what it was in chemistry at the time. This has changed too in the meantime and activity based analysis software is readily available. 19 (b) In order to render the analyses more robust, he often used global analyses requiring data fusion principles. 20,21 Interestingly, the globalisation of analyses was also developed independently around the same time, in soft-modelling, 22 as well as in hardmodelling. 23 And one last praise for the pioneering work of Edmund R. Malinowski. He was not only the grandfather of factor analysis in chemistry, but he was also an eminent chemist, deeply interested in the chemistry of the processes he investigated. Edmund R. Malinowski showed us that using chemometrics methods can support the quantitative understanding of chemistry.