Rigorous and compliant approaches to one-class classification
Introduction
One-class classification (OCC) [1], [2] consists in making a description of a target class of objects and in detecting whether a new object resembles this class or not. The term class modelling is often used for denoting OCC methods [3]. In some sense, this approach is opposite to the discrimination problem that is to allocate a new object to one of distinct and exhaustive classes [4]. The critical difference between OCC and discriminant analysis (DA) is that the OCC model is developed using target class samples only.
The work of Harold Hotelling on multivariate quality control (1947) can be considered as the first example of multivariate one-class classification in chemistry [5]. The unequal class models (UNEQ) method was developed by Derde and Massart (1986) as an evolution of these concepts [6]. In fact, such a method – closely related to quadratic discriminant analysis (QDA) – is based on the hypothesis of a multivariate normal distribution in the class to be modelled and defines the width of the class space based on Hotelling's T2 statistics, at a selected confidence level.
The first method specifically developed for one-class classification in chemometrics was soft independent modelling of class analogy (SIMCA), by Svante Wold [7], [8]. This method performs PCA on the samples of the class to be modelled – the SIMCA model being defined as the range of sample scores on the significant PCs. A critical distance, at a given confidence level, is obtained by application of the Fisher F statistics to residuals of each training sample to the model, and is used to define the boundaries of the SIMCA class space around the model.
OCC modelling is a rather new strategy in comparison with DA. The classical OCC version does not utilise any information about non-target (extraneous) classes, even when the data regarding such extraneous classes is available. We call such an approach a ‘rigorous’ one. Contributing to the OCC technique elaboration, we consider the outcomes that can be yielded in case the rigorous concept is violated. The most common violation – which we call a ‘compliant’ approach – makes use of some relevant non-target information that can influence the results of the OCC modelling.
The main objective of the present study is the comparison between the outcomes of ‘rigorous’ and ‘compliant’ approaches. For this purposes, two different OCC methods, namely, partial least squares density modelling (PLS-DM) [9], and data-driven soft independent modelling of class analogy (DD-SIMCA) [10] are employed. Method descriptions are presented in 3.1 Dataset, 3.2 Dataset. An additional goal is to compare these techniques using two real world examples.
Section snippets
Figures of merit
Performances of one-class classifiers are usually reported using two parameters: sensitivity and specificity. Sensitivity is the fraction of samples of the target class which are correctly recognised as consistent with the model. It can also be defined as the rate of true positives and, therefore, it is complementary to type I error (i.e., the false negative rate). Specificity is the fraction of samples extraneous to the target class which are correctly recognised as inconsistent with the
Materials
We consider two different datasets. One set, Olives, is comprised of samples of natural origin, olives in brine. Variability among samples is inevitable. In the present study, variability is taken into account both within a single harvest year and between different harvest years. The second dataset, Remedy, consists of samples of artificial origin, uncoated tablets. Certainly, variability between samples is much lower and mainly manifests as variation between batches.
DD-SIMCA
As it was mentioned above, two types of models, ‘rigorous’ and ‘compliant’, are considered. The results regarding model sensitivity are presented in Table 3. The best results for the ‘rigorous’ model are obtained with 3 PCs and type I error α=0.01. Both a-priori α values are in good agreement with a-posteriori sensitivity calculated for subsets T1, I1 and E1.
At the same time, specificity is not completely satisfactory (see Fig. 2a and Table 4). Misclassification results are originated from
Results for dataset Remedy
Unlike the Olives case, we consider three peer subsets corresponding to three different manufactures. Samples originated from each manufacture are considered as target class samples and three OCC models are built respectively.
Discussion
Comparing the two OCC methods, we can conclude that DD-SIMCA is a global modelling method, while PLS-DM represents a local approach. At a fixed level of type I error, α, the first method has the only free parameter – the number of PCs – that can be used for tuning in case of ‘compliant’ approach. When the number of PCs is increased, training sensitivity is varying near to the given sensitivity level (1–α), while validation sensitivity is decreasing. These tendencies are observed due to evident
Conclusions
A distinct feature of OCC is the possibility to build a model for one class without in-depth information regarding other classes or samples. In the ‘rigorous’ OCC approach, all model parameters and validation procedures are based only using information regarding the target class. This can be considered as an advantage of OCC, especially for solving authentication problems. At the same time, for overlapping datasets, this is a drawback. When the classes under study are well separated, the
Acknowledgments
Financial support by the Italian Ministry of Education, Universities and Research (MIUR) is acknowledged – Research Project SIR 2014 “Advanced strategies in near infrared spectroscopy and multivariate data analysis for food safety and authentication”, RBSI14CJHJ (CUP: D32I15000150008).
References (18)
- et al.
Comparison of the performance of the class modelling techniques UNEQ, SIMCA, and PRIMA
Chemom. Intell. Lab. Syst.
(1988) - et al.
Discriminant analysis is an inappropriate method of authentication
TrAC Trends Anal. Chem.
(2016) - et al.
UNEQ: a disjoint modelling technique for pattern recognition based on normal distribution
Anal. Chim. Acta
(1986) Pattern recognition by means of disjoint principal components models
Pattern Recognit.
(1976)- et al.
Partial least squares density modeling (PLS-DM) – a new class-modeling strategy applied to the authentication of olives in brine by near-infrared spectroscopy
Anal. Chim. Acta
(2014) - et al.
Multivariate class modeling for the verification of food-authenticity claims
TrAC - Trends Anal. Chem.
(2012) - et al.
Quantitative risk assessment in classification of drugs with identical API content
J. Pharm. Biomed. Anal.
(2014) - et al.
One-class classifier networks for target recognition applications. In: I. International Neural Network Society
- et al.
Cited by (134)
Characterization and classification of oleogels and edible oil using vibrational spectroscopy in tandem with one-class and multiclass chemometric methods
2024, Spectrochimica Acta - Part A: Molecular and Biomolecular SpectroscopyMultivariate curve resolution-soft independent modelling of class analogy (MCR-SIMCA)
2024, Analytica Chimica ActaDetection of adulteration of Alpaca (Vicugna pacos) meat using a portable NIR spectrometer and NIR-hyperspectral imaging
2024, Journal of Food Composition and Analysis